[x265-commits] [x265] remove unused recon[] from assembly code

Thu Apr 3 07:33:38 CEST 2014

details:   http://hg.videolan.org/x265/rev/a6930bfbd908
branches:  
changeset: 6643:a6930bfbd908
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 01 16:41:59 2014 -0700
description:
remove unused recon[] from assembly code
Subject: [x265] calcQpForCU: remove m_pic input parameter.

details:   http://hg.videolan.org/x265/rev/03bad90e94ad
branches:  
changeset: 6644:03bad90e94ad
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 02 06:51:35 2014 +0530
description:
calcQpForCU: remove m_pic input parameter.
Subject: [x265] Backed out changeset: a6930bfbd908

details:   http://hg.videolan.org/x265/rev/d0b5ea32525b
branches:  
changeset: 6645:d0b5ea32525b
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 02 15:58:19 2014 +0530
description:
Backed out changeset: a6930bfbd908

This changeset causes crashes. Needs to be re-examined.
Subject: [x265] frameencoder: removing assign qp inconsistencies which were triggered for unreferenced P frames

details:   http://hg.videolan.org/x265/rev/606da0b6bc58
branches:  stable
changeset: 6646:606da0b6bc58
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 02 17:05:08 2014 +0530
description:
frameencoder: removing assign qp inconsistencies which were triggered for unreferenced P frames
Subject: [x265] Merge from stable

details:   http://hg.videolan.org/x265/rev/3f27daf35506
branches:  
changeset: 6647:3f27daf35506
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 02 17:08:25 2014 +0530
description:
Merge from stable
Subject: [x265] param: set aq strength to 0 in CQP

details:   http://hg.videolan.org/x265/rev/dc887415f6df
branches:  
changeset: 6648:dc887415f6df
user:      Aarthi Thirumalai
date:      Wed Apr 02 17:19:38 2014 +0530
description:
param: set aq strength to 0 in CQP
Subject: [x265] param: fix typo in if-check.

details:   http://hg.videolan.org/x265/rev/261b3c2e788e
branches:  
changeset: 6649:261b3c2e788e
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Apr 02 17:37:58 2014 +0530
description:
param: fix typo in if-check.
Subject: [x265] frameencoder: fix white-space nit, add comment

details:   http://hg.videolan.org/x265/rev/67c0aa70a125
branches:  
changeset: 6650:67c0aa70a125
user:      Steve Borho <steve at borho.org>
date:      Wed Apr 02 15:45:14 2014 -0500
description:
frameencoder: fix white-space nit, add comment
Subject: [x265] weight: properly reset weights when no-residual early-out is taken

details:   http://hg.videolan.org/x265/rev/e03388e98ecc
branches:  stable
changeset: 6651:e03388e98ecc
user:      Steve Borho <steve at borho.org>
date:      Wed Apr 02 22:51:49 2014 -0500
description:
weight: properly reset weights when no-residual early-out is taken

This fixes a hash mismatch seen with a Main10 encode of sintel-480p
Subject: [x265] cleanup m_cuColocated[]

details:   http://hg.videolan.org/x265/rev/ccb2b7c26bb6
branches:  
changeset: 6652:ccb2b7c26bb6
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Wed Apr 02 17:09:23 2014 +0900
description:
cleanup m_cuColocated[]
Subject: [x265] remove unused parameter *recon from assembly code

details:   http://hg.videolan.org/x265/rev/fdfad9734231
branches:  
changeset: 6653:fdfad9734231
user:      Min Chen <chenm003 at 163.com>
date:      Wed Apr 02 13:12:50 2014 -0700
description:
remove unused parameter *recon from assembly code
Subject: [x265] testbench: use different stride on calcrecon

details:   http://hg.videolan.org/x265/rev/89af57686794
branches:  
changeset: 6654:89af57686794
user:      Min Chen <chenm003 at 163.com>
date:      Wed Apr 02 13:13:05 2014 -0700
description:
testbench: use different stride on calcrecon
Subject: [x265] dpb: Allow two L1 refs when b-pyramid is enabled [CHANGES OUTPUTS]

details:   http://hg.videolan.org/x265/rev/d815c4a8fa74
branches:  
changeset: 6655:d815c4a8fa74
user:      Gopu Govindaswamy
date:      Wed Apr 02 14:07:54 2014 +0530
description:
dpb: Allow two L1 refs when b-pyramid is enabled [CHANGES OUTPUTS]

Consider this common case: if we have 5 consecutive (display order frames) that
are determined to be P1-B1-B2-B3-P2 by the lookahead. When b-pyramid is
enabled, the middle B will be encoded first and used as a reference by the two
following B frames (in encode order); P1-P2-B2ref-B1-B3

frame	L0	  L1
P1
P2	P1
B2ref	P1	  P2
B1	P1	  P2 B2ref
B3	B2ref P1  P2

When B1 is encoded, both B2ref and P2 should be available as L1 references,
this will improve the encode compression efficiency when b-pyramid is enabled
(closes #12)
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/c0362b478e23
branches:  
changeset: 6656:c0362b478e23
user:      Steve Borho <steve at borho.org>
date:      Wed Apr 02 22:52:19 2014 -0500
description:
Merge with stable

diffstat:

 source/Lib/TLibCommon/TComDataCU.cpp  |   24 -
 source/Lib/TLibCommon/TComDataCU.h    |    3 -
 source/Lib/TLibEncoder/TEncSearch.cpp |    4 +-
 source/common/pixel.cpp               |    4 +-
 source/common/primitives.h            |    2 +-
 source/common/x86/pixel-util.h        |   12 +-
 source/common/x86/pixel-util8.asm     |  652 +++++++++++++--------------------
 source/encoder/compress.cpp           |   10 +-
 source/encoder/dpb.h                  |    2 +-
 source/encoder/encoder.cpp            |    6 +
 source/encoder/frameencoder.cpp       |   18 +-
 source/encoder/frameencoder.h         |    2 +-
 source/encoder/weightPrediction.cpp   |    3 +
 source/test/pixelharness.cpp          |   41 +-
 14 files changed, 315 insertions(+), 468 deletions(-)

diffs (truncated from 1206 to 300 lines):

diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibCommon/TComDataCU.cpp

--- a/source/Lib/TLibCommon/TComDataCU.cpp	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibCommon/TComDataCU.cpp	Wed Apr 02 22:52:19 2014 -0500
@@ -91,8 +91,6 @@ TComDataCU::TComDataCU()
     m_cuAboveRight = NULL;
     m_cuAbove = NULL;
     m_cuLeft = NULL;
-    m_cuColocated[0] = NULL;
-    m_cuColocated[1] = NULL;
     m_mvpIdx[0] = NULL;
     m_mvpIdx[1] = NULL;
     m_chromaFormat = 0;
@@ -280,9 +278,6 @@ void TComDataCU::initCU(TComPic* pic, ui
     m_cuAboveLeft   = NULL;
     m_cuAboveRight  = NULL;
 
-    m_cuColocated[0] = NULL;
-    m_cuColocated[1] = NULL;
-
     uint32_t uiWidthInCU = pic->getFrameWidthInCU();
     if (m_cuAddr % uiWidthInCU)
     {
@@ -303,16 +298,6 @@ void TComDataCU::initCU(TComPic* pic, ui
     {
         m_cuAboveRight = pic->getCU(m_cuAddr - uiWidthInCU + 1);
     }
-
-    if (getSlice()->getNumRefIdx(REF_PIC_LIST_0) > 0)
-    {
-        m_cuColocated[0] = getSlice()->getRefPic(REF_PIC_LIST_0, 0)->getCU(m_cuAddr);
-    }
-
-    if (getSlice()->getNumRefIdx(REF_PIC_LIST_1) > 0)
-    {
-        m_cuColocated[1] = getSlice()->getRefPic(REF_PIC_LIST_1, 0)->getCU(m_cuAddr);
-    }
 }
 
 /** initialize prediction data with enabling sub-LCU-level delta QP
@@ -457,9 +442,6 @@ void TComDataCU::initSubCU(TComDataCU* c
     m_cuAbove       = cu->getCUAbove();
     m_cuAboveLeft   = cu->getCUAboveLeft();
     m_cuAboveRight  = cu->getCUAboveRight();
-
-    m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
-    m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
 }
 
 // initialize Sub partition
@@ -526,9 +508,6 @@ void TComDataCU::initSubCU(TComDataCU* c
     m_cuAbove       = cu->getCUAbove();
     m_cuAboveLeft   = cu->getCUAboveLeft();
     m_cuAboveRight  = cu->getCUAboveRight();
-
-    m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
-    m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
 }
 
 
@@ -620,9 +599,6 @@ void TComDataCU::copyPartFrom(TComDataCU
     m_cuAbove          = cu->getCUAbove();
     m_cuLeft           = cu->getCULeft();
 
-    m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
-    m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
-
     m_cuMvField[0].copyFrom(cu->getCUMvField(REF_PIC_LIST_0), cu->getTotalNumPart(), offset);
     m_cuMvField[1].copyFrom(cu->getCUMvField(REF_PIC_LIST_1), cu->getTotalNumPart(), offset);
 
diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibCommon/TComDataCU.h
--- a/source/Lib/TLibCommon/TComDataCU.h	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibCommon/TComDataCU.h	Wed Apr 02 22:52:19 2014 -0500
@@ -129,7 +129,6 @@ private:
     TComDataCU*   m_cuAboveRight;    ///< pointer of above-right CU
     TComDataCU*   m_cuAbove;         ///< pointer of above CU
     TComDataCU*   m_cuLeft;          ///< pointer of left CU
-    TComDataCU*   m_cuColocated[2];  ///< pointer of temporally colocated CU's for both directions
 
     // -------------------------------------------------------------------------------------------------------------------
     // coding tool information
@@ -387,8 +386,6 @@ public:
 
     TComDataCU*   getCUAboveRight() { return m_cuAboveRight; }
 
-    TComDataCU*   getCUColocated(int picList) { return m_cuColocated[picList]; }
-
     TComDataCU*   getPULeft(uint32_t& lPartUnitIdx,
                             uint32_t  curPartUnitIdx,
                             bool      bEnforceSliceRestriction = true,
diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp	Wed Apr 02 22:52:19 2014 -0500
@@ -465,7 +465,7 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
 
     assert(width <= 32);
     //===== reconstruction =====
-    primitives.calcrecon[size](pred, residual, 0, reconQt, reconIPred, stride, MAX_CU_SIZE, reconIPredStride);
+    primitives.calcrecon[size](pred, residual, reconQt, reconIPred, stride, MAX_CU_SIZE, reconIPredStride);
     //===== update distortion =====
     outDist += primitives.sse_sp[part](reconQt, MAX_CU_SIZE, fenc, stride);
 }
@@ -587,7 +587,7 @@ void TEncSearch::xIntraCodingChromaBlk(T
     assert(((intptr_t)residual & (width - 1)) == 0);
     assert(width <= 32);
     //===== reconstruction =====
-    primitives.calcrecon[size](pred, residual, 0, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
+    primitives.calcrecon[size](pred, residual, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
     //===== update distortion =====
     uint32_t dist = primitives.sse_sp[part](reconQt, reconQtStride, fenc, stride);
     if (ttype == TEXT_CHROMA_U)
diff -r 0206822d9fea -r c0362b478e23 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/pixel.cpp	Wed Apr 02 22:52:19 2014 -0500
@@ -460,9 +460,7 @@ void getResidual(pixel *fenc, pixel *pre
 }
 
 template<int blockSize>
-void calcRecons(pixel* pred, int16_t* residual,
-                pixel*,
-                int16_t* recqt, pixel* recipred, int stride, int qtstride, int ipredstride)
+void calcRecons(pixel* pred, int16_t* residual, int16_t* recqt, pixel* recipred, int stride, int qtstride, int ipredstride)
 {
     for (int y = 0; y < blockSize; y++)
     {
diff -r 0206822d9fea -r c0362b478e23 source/common/primitives.h
--- a/source/common/primitives.h	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/primitives.h	Wed Apr 02 22:52:19 2014 -0500
@@ -125,7 +125,7 @@ typedef void (*cvt32to16_shr_t)(int16_t 
 typedef void (*dct_t)(int16_t *src, int32_t *dst, intptr_t stride);
 typedef void (*idct_t)(int32_t *src, int16_t *dst, intptr_t stride);
 typedef void (*calcresidual_t)(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
-typedef void (*calcrecon_t)(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+typedef void (*calcrecon_t)(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
 typedef void (*transpose_t)(pixel* dst, pixel* src, intptr_t stride);
 typedef uint32_t (*quant_t)(int32_t *coef, int32_t *quantCoeff, int32_t *deltaU, int32_t *qCoef, int qBits, int add, int numCoeff, int32_t* lastPos);
 typedef void (*dequant_scaling_t)(const int32_t* src, const int32_t *dequantCoef, int32_t* dst, int num, int mcqp_miper, int shift);
diff -r 0206822d9fea -r c0362b478e23 source/common/x86/pixel-util.h
--- a/source/common/x86/pixel-util.h	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/x86/pixel-util.h	Wed Apr 02 22:52:19 2014 -0500
@@ -24,12 +24,12 @@
 #ifndef X265_PIXEL_UTIL_H
 #define X265_PIXEL_UTIL_H
 
-void x265_calcRecons4_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons8_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons16_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons32_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons16_sse4(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons32_sse4(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons4_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons8_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons16_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons32_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons16_sse4(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons32_sse4(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
 
 void x265_getResidual4_sse2(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
 void x265_getResidual8_sse2(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
diff -r 0206822d9fea -r c0362b478e23 source/common/x86/pixel-util8.asm
--- a/source/common/x86/pixel-util8.asm	Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/x86/pixel-util8.asm	Wed Apr 02 22:52:19 2014 -0500
@@ -58,590 +58,452 @@ cextern pw_2000
 cextern pw_pixel_max
 
 ;-----------------------------------------------------------------------------
-; void calcrecon(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred)
+; void calcrecon(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred)
 ;-----------------------------------------------------------------------------
 INIT_XMM sse2
-cglobal calcRecons4
 %if HIGH_BIT_DEPTH
 %if ARCH_X86_64 == 1
-    DECLARE_REG_TMP 0,1,2,3,4,5,6,7,8
-    PROLOGUE 6,9,6
+cglobal calcRecons4, 5,8,4
+    %define t7b     r7b
 %else
-    DECLARE_REG_TMP 0,1,2,3,4,5
-    PROLOGUE 6,7,6
-    %define t6      r6m
-    %define t6d     r6d
-    %define t7      r7m
-    %define t8d     r6d
+cglobal calcRecons4, 5,7,4,0-1
+    %define t7b     byte [rsp]
 %endif
-
-    mov         t6d, r6m
-%if ARCH_X86_64 == 0
-    add         t6d, t6d
-    mov         r6m, t6d
-%else
+    mov         r4d, r4m
     mov         r5d, r5m
-    mov         r7d, r7m
-    add         t6d, t6d
-    add         t7, t7
-%endif
+    mov         r6d, r6m
+    add         r4d, r4d
+    add         r5d, r5d
+    add         r6d, r6d
 
     pxor        m4, m4
     mova        m5, [pw_pixel_max]
-    add         t5, t5
-    mov         t8d, 4/2
+    mov         t7b, 4/2
 .loop:
-    movh        m0, [t0]
-    movh        m1, [t0 + t5]
+    movh        m0, [r0]
+    movh        m1, [r0 + r4]
     punpcklqdq  m0, m1
-    movh        m2, [t1]
-    movh        m3, [t1 + t5]
+    movh        m2, [r1]
+    movh        m3, [r1 + r4]
     punpcklqdq  m2, m3
     paddw       m0, m2
     CLIPW       m0, m4, m5
 
-    ; store recon[] and recipred[]
-    movh        [t4], m0
-%if ARCH_X86_64 == 0
-    add         t4, t7
-    add         t4, t7
-    movhps      [t4], m0
-    add         t4, t7
-    add         t4, t7
+    ; store recipred[]
+    movh        [r3], m0
+    movhps      [r3 + r6], m0
+
+    ; store recqt[]
+    movh        [r2], m0
+    movhps      [r2 + r5], m0
+
+    lea         r0, [r0 + r4 * 2]
+    lea         r1, [r1 + r4 * 2]
+    lea         r2, [r2 + r5 * 2]
+    lea         r3, [r3 + r6 * 2]
+
+    dec         t7b
+    jnz        .loop
+    RET
+%else          ;HIGH_BIT_DEPTH
+
+%if ARCH_X86_64 == 1
+cglobal calcRecons4, 5,8,4
+    %define t7b     r7b
 %else
-    movhps      [t4 + t7], m0
-    lea         t4, [t4 + t7 * 2]
+cglobal calcRecons4, 5,7,4,0-1
+    %define t7b     byte [rsp]
 %endif
-
-    ; store recqt[]
-    movh        [t3], m0
-    add         t3, t6
-    movhps      [t3], m0
-    add         t3, t6
-
-    lea         t0, [t0 + t5 * 2]
-    lea         t1, [t1 + t5 * 2]
-
-    dec         t8d
-    jnz        .loop
-
-%else          ;HIGH_BIT_DEPTH
-%if ARCH_X86_64 == 1
-    DECLARE_REG_TMP 0,1,2,3,4,5,6,7,8
-    PROLOGUE 6,9,4
-%else
-    DECLARE_REG_TMP 0,1,2,3,4,5
-    PROLOGUE 6,7,4
-    %define t6      r6m
-    %define t6d     r6d
-    %define t7      r7m
-    %define t8d     r6d
-%endif
-
-    mov         t6d, r6m
-%if ARCH_X86_64 == 0
-    add         t6d, t6d
-    mov         r6m, t6d
-%else
+    mov         r4d, r4m
     mov         r5d, r5m
-    mov         r7d, r7m
-    add         t6d, t6d
-%endif
+    mov         r6d, r6m
+    add         r5d, r5d
 
     pxor        m0, m0
-    mov         t8d, 4/2
+    mov         t7b, 4/2
 .loop: