[x265-commits] [x265] analysis: encodeResidue rewrite, much improved --rd 0

Fri Oct 24 01:15:37 CEST 2014

details:   http://hg.videolan.org/x265/rev/bd865dd464bc
branches:  
changeset: 8622:bd865dd464bc
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 22 20:21:53 2014 -0500
description:
analysis: encodeResidue rewrite, much improved --rd 0

it's not clear --rd 0 is always correct, but I can encode long clips without
hash mistakes and at reasonable bitrates (compared to previous --rd 0). I
suspect there is still problems with passing in residual to
residualTransformQuantInter() and getting it back in the same ShortYuv instance
Subject: [x265] cudata: use static array of absolute depth broadcast set functions

details:   http://hg.videolan.org/x265/rev/f593e0455cbc
branches:  
changeset: 8623:f593e0455cbc
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 06:57:56 2014 -0500
description:
cudata: use static array of absolute depth broadcast set functions

this commit changed the value arguments to these set functions to match the data
type of their array, forcing one cast in analysis.cpp to avoid a warning.
Subject: [x265] cudata: push more data type casts out to callers

details:   http://hg.videolan.org/x265/rev/bb5814a49de5
branches:  
changeset: 8624:bb5814a49de5
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 07:07:13 2014 -0500
description:
cudata: push more data type casts out to callers
Subject: [x265] search: use intptr_t for picture stride variables

details:   http://hg.videolan.org/x265/rev/fa3e1744f125
branches:  
changeset: 8625:fa3e1744f125
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 07:29:07 2014 -0500
description:
search: use intptr_t for picture stride variables
Subject: [x265] cudata: simplify allocation / initialization interfaces

details:   http://hg.videolan.org/x265/rev/ebaeb6aa5dda
branches:  
changeset: 8626:ebaeb6aa5dda
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 07:40:57 2014 -0500
description:
cudata: simplify allocation / initialization interfaces

the callers shouldn't need to know details about partitions or coeff buffer
sizes
Subject: [x265] cudata: remove default arguments for getPUAboveRightAdi(), getPUBelowLeftAdi()

details:   http://hg.videolan.org/x265/rev/077015265a08
branches:  
changeset: 8627:077015265a08
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 07:44:34 2014 -0500
description:
cudata: remove default arguments for getPUAboveRightAdi(), getPUBelowLeftAdi()
Subject: [x265] slice: move numPartitions and numPartInCUSize from FrameData to SPS

details:   http://hg.videolan.org/x265/rev/17c5d2cc1335
branches:  
changeset: 8628:17c5d2cc1335
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 08:08:49 2014 -0500
description:
slice: move numPartitions and numPartInCUSize from FrameData to SPS

these fields never change, so it made no sense to have copies in every FrameData
they are based on CTU size, so SPS made sense
Subject: [x265] entropy: drop last use of g_winUnitX, g_winUnitY

details:   http://hg.videolan.org/x265/rev/77210e81c4ad
branches:  
changeset: 8629:77210e81c4ad
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 08:09:41 2014 -0500
description:
entropy: drop last use of g_winUnitX, g_winUnitY
Subject: [x265] cudata: cache numPartInCUSize as a class static

details:   http://hg.videolan.org/x265/rev/2763d49b2e23
branches:  
changeset: 8630:2763d49b2e23
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 09:14:40 2014 -0500
description:
cudata: cache numPartInCUSize as a class static

The obliviates a lot of pointer dereferences in some key functions
Subject: [x265] search: re-combine --pme with --no-pme code paths

details:   http://hg.videolan.org/x265/rev/f8ee24fbbede
branches:  
changeset: 8631:f8ee24fbbede
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 13:11:40 2014 -0500
description:
search: re-combine --pme with --no-pme code paths
Subject: [x265] search: fix a change of outputs from f3bd6e5a880a, always zero unused refs

details:   http://hg.videolan.org/x265/rev/8ac590040e8c
branches:  
changeset: 8632:8ac590040e8c
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 14:47:17 2014 -0500
description:
search: fix a change of outputs from f3bd6e5a880a, always zero unused refs

it's not clear why this affects outputs, but it seems better to err on the side
of the data being initialized.
Subject: [x265] predict: rename members for clarity, save work in singleMotionEstimation()

details:   http://hg.videolan.org/x265/rev/c942de89cbed
branches:  
changeset: 8633:c942de89cbed
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 15:08:13 2014 -0500
description:
predict: rename members for clarity, save work in singleMotionEstimation()

The first thing singleMotionEstimation() did was call getPartIndexAndSize()
to get the PU part index and dimensions. Then it called prepMotionCompensation()
which did the exact same thing, storing its outputs into member variables.
(after predInterSearch() had already done it twice as well)

Now singleMotionEstimation() and predInterSearch() both directly use the
variables initialized by prepMotionCompensation(). Now when the master thread
calls its own singleMotionEstimation(), there is much less redundant work
Subject: [x265] predict: enforce calling conventions, fix wrong side-effects

details:   http://hg.videolan.org/x265/rev/ff804d8ab03d
branches:  
changeset: 8634:ff804d8ab03d
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 15:59:44 2014 -0500
description:
predict: enforce calling conventions, fix wrong side-effects

use references and consts where possible, order arguments to follow the
convention of memcpy (dest, src)

This exposed a bug in addWeightBi() and addWeightUni(), they were modifying the
PU size variables directly instead of making chroma versions. This explains why
it seemed to best necessary at times to make seemingly redundant calls to
prepMotionCompensation.

as a side-effect, this commit also removes the 1k 'avg' buffer that bidir
allocated on the stack and instead uses the existing tmpPredYuv
Subject: [x265] cudata: remove unused method

details:   http://hg.videolan.org/x265/rev/260eee4634a5
branches:  
changeset: 8635:260eee4634a5
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 17:04:56 2014 -0500
description:
cudata: remove unused method
Subject: [x265] search: large mostly mechanical change to pass cu by reference

details:   http://hg.videolan.org/x265/rev/b2005914aeb7
branches:  
changeset: 8636:b2005914aeb7
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 17:50:42 2014 -0500
description:
search: large mostly mechanical change to pass cu by reference
Subject: [x265] analysis: remove unnecessary set of skip flags in checkInter_rd5_6()

details:   http://hg.videolan.org/x265/rev/79f0d5f296ef
branches:  
changeset: 8637:79f0d5f296ef
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 17:57:31 2014 -0500
description:
analysis: remove unnecessary set of skip flags in checkInter_rd5_6()

initSubCU() does this already, and the pred's cu is not being reused
Subject: [x265] analysis: cleanup checkInter functions

details:   http://hg.videolan.org/x265/rev/daed2d3f67ba
branches:  
changeset: 8638:daed2d3f67ba
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 23 18:03:33 2014 -0500
description:
analysis: cleanup checkInter functions

diffstat:

 source/common/cudata.cpp        |   243 +++--
 source/common/cudata.h          |    43 +-
 source/common/deblock.cpp       |    16 +-
 source/common/framedata.cpp     |    15 +-
 source/common/framedata.h       |     3 -
 source/common/predict.cpp       |   260 +++---
 source/common/predict.h         |    26 +-
 source/common/quant.cpp         |    32 +-
 source/common/quant.h           |     6 +-
 source/common/slice.h           |     2 +
 source/encoder/analysis.cpp     |   315 +++-----
 source/encoder/encoder.cpp      |     6 +-
 source/encoder/entropy.cpp      |    11 +-
 source/encoder/frameencoder.cpp |     2 +-
 source/encoder/search.cpp       |  1393 ++++++++++++++++----------------------
 source/encoder/search.h         |    15 +-
 16 files changed, 1082 insertions(+), 1306 deletions(-)

diffs (truncated from 4723 to 300 lines):

diff -r ce304756a6e4 -r daed2d3f67ba source/common/cudata.cpp

--- a/source/common/cudata.cpp	Wed Oct 22 23:16:13 2014 -0500
+++ b/source/common/cudata.cpp	Thu Oct 23 18:03:33 2014 -0500
@@ -32,6 +32,10 @@ using namespace x265;
 namespace {
 // file private namespace
 
+/* for all bcast* and copy* functions, dst and src are aligned to MIN(size, 32) */
+
+void bcast1(uint8_t* dst, uint8_t val)  { dst[0] = val; }
+
 void copy4(uint8_t* dst, uint8_t* src)  { ((uint32_t*)dst)[0] = ((uint32_t*)src)[0]; }
 void bcast4(uint8_t* dst, uint8_t val)  { ((uint32_t*)dst)[0] = 0x01010101 * val; }
 
@@ -46,6 +50,8 @@ void bcast64(uint8_t* dst, uint8_t val) 
                                           ((uint64_t*)dst)[0] = bval; ((uint64_t*)dst)[1] = bval; ((uint64_t*)dst)[2] = bval; ((uint64_t*)dst)[3] = bval;
                                           ((uint64_t*)dst)[4] = bval; ((uint64_t*)dst)[5] = bval; ((uint64_t*)dst)[6] = bval; ((uint64_t*)dst)[7] = bval; }
 
+/* at 256 bytes, memset/memcpy will probably use SIMD more effectively than our uint64_t hack,
+ * but hand-written assembly would beat it. */
 void copy256(uint8_t* dst, uint8_t* src) { memcpy(dst, src, 256); }
 void bcast256(uint8_t* dst, uint8_t val) { memset(dst, val, 256); }
 
@@ -139,17 +145,52 @@ const uint32_t partAddrTable[8][4] =
 
 }
 
+cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
+uint32_t CUData::s_numPartInCUSize;
+
 CUData::CUData()
 {
     memset(this, 0, sizeof(*this));
 }
 
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t numPartition, uint32_t cuSize, int csp, int instance)
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
 {
+    m_chromaFormat  = csp;
     m_hChromaShift  = CHROMA_H_SHIFT(csp);
     m_vChromaShift  = CHROMA_V_SHIFT(csp);
-    m_chromaFormat  = csp;
-    m_numPartitions = numPartition;
+    m_numPartitions = MAX_NUM_PARTITIONS >> (depth * 2);
+
+    if (!s_partSet[0])
+    {
+        s_numPartInCUSize = 1 << g_maxFullDepth;
+        switch (g_maxLog2CUSize)
+        {
+        case 6:
+            s_partSet[0] = bcast256;
+            s_partSet[1] = bcast64;
+            s_partSet[2] = bcast16;
+            s_partSet[3] = bcast4;
+            s_partSet[4] = bcast1;
+            break;
+        case 5:
+            s_partSet[0] = bcast64;
+            s_partSet[1] = bcast16;
+            s_partSet[2] = bcast4;
+            s_partSet[3] = bcast1;
+            s_partSet[4] = NULL;
+            break;
+        case 4:
+            s_partSet[0] = bcast16;
+            s_partSet[1] = bcast4;
+            s_partSet[2] = bcast1;
+            s_partSet[3] = NULL;
+            s_partSet[4] = NULL;
+            break;
+        default:
+            X265_CHECK(0, "unexpected CTU size\n");
+            break;
+        }
+    }
 
     switch (m_numPartitions)
     {
@@ -183,38 +224,39 @@ void CUData::initialize(const CUDataMemP
     }
 
     /* Each CU's data is layed out sequentially within the charMemBlock */
-    uint8_t *charBuf = dataPool.charMemBlock + (numPartition * BytesPerPartition) * instance;
+    uint8_t *charBuf = dataPool.charMemBlock + (m_numPartitions * BytesPerPartition) * instance;
 
-    m_qp          = (char*)charBuf; charBuf += numPartition;
-    m_log2CUSize         = charBuf; charBuf += numPartition;
-    m_partSize           = charBuf; charBuf += numPartition;
-    m_predMode           = charBuf; charBuf += numPartition;
-    m_lumaIntraDir       = charBuf; charBuf += numPartition;
-    m_tqBypass           = charBuf; charBuf += numPartition;
-    m_refIdx[0]   = (char*)charBuf; charBuf += numPartition;
-    m_refIdx[1]   = (char*)charBuf; charBuf += numPartition;
-    m_depth              = charBuf; charBuf += numPartition;
-    m_skipFlag           = charBuf; charBuf += numPartition; /* the order up to here is important in initCTU() and initSubCU() */
-    m_mergeFlag          = charBuf; charBuf += numPartition;
-    m_interDir           = charBuf; charBuf += numPartition;
-    m_mvpIdx[0]          = charBuf; charBuf += numPartition;
-    m_mvpIdx[1]          = charBuf; charBuf += numPartition;
-    m_trIdx              = charBuf; charBuf += numPartition;
-    m_transformSkip[0]   = charBuf; charBuf += numPartition;
-    m_transformSkip[1]   = charBuf; charBuf += numPartition;
-    m_transformSkip[2]   = charBuf; charBuf += numPartition;
-    m_cbf[0]             = charBuf; charBuf += numPartition;
-    m_cbf[1]             = charBuf; charBuf += numPartition;
-    m_cbf[2]             = charBuf; charBuf += numPartition;
-    m_chromaIntraDir     = charBuf; charBuf += numPartition;
+    m_qp          = (char*)charBuf; charBuf += m_numPartitions;
+    m_log2CUSize         = charBuf; charBuf += m_numPartitions;
+    m_partSize           = charBuf; charBuf += m_numPartitions;
+    m_predMode           = charBuf; charBuf += m_numPartitions;
+    m_lumaIntraDir       = charBuf; charBuf += m_numPartitions;
+    m_tqBypass           = charBuf; charBuf += m_numPartitions;
+    m_refIdx[0]   = (char*)charBuf; charBuf += m_numPartitions;
+    m_refIdx[1]   = (char*)charBuf; charBuf += m_numPartitions;
+    m_depth              = charBuf; charBuf += m_numPartitions;
+    m_skipFlag           = charBuf; charBuf += m_numPartitions; /* the order up to here is important in initCTU() and initSubCU() */
+    m_mergeFlag          = charBuf; charBuf += m_numPartitions;
+    m_interDir           = charBuf; charBuf += m_numPartitions;
+    m_mvpIdx[0]          = charBuf; charBuf += m_numPartitions;
+    m_mvpIdx[1]          = charBuf; charBuf += m_numPartitions;
+    m_trIdx              = charBuf; charBuf += m_numPartitions;
+    m_transformSkip[0]   = charBuf; charBuf += m_numPartitions;
+    m_transformSkip[1]   = charBuf; charBuf += m_numPartitions;
+    m_transformSkip[2]   = charBuf; charBuf += m_numPartitions;
+    m_cbf[0]             = charBuf; charBuf += m_numPartitions;
+    m_cbf[1]             = charBuf; charBuf += m_numPartitions;
+    m_cbf[2]             = charBuf; charBuf += m_numPartitions;
+    m_chromaIntraDir     = charBuf; charBuf += m_numPartitions;
 
-    X265_CHECK(charBuf == dataPool.charMemBlock + (numPartition * BytesPerPartition) * (instance + 1), "CU data layout is broken\n");
+    X265_CHECK(charBuf == dataPool.charMemBlock + (m_numPartitions * BytesPerPartition) * (instance + 1), "CU data layout is broken\n");
 
-    m_mv[0]  = dataPool.mvMemBlock + (instance * 4) * numPartition;
-    m_mv[1]  = m_mv[0] +  numPartition;
-    m_mvd[0] = m_mv[1] +  numPartition;
-    m_mvd[1] = m_mvd[0] + numPartition;
+    m_mv[0]  = dataPool.mvMemBlock + (instance * 4) * m_numPartitions;
+    m_mv[1]  = m_mv[0] +  m_numPartitions;
+    m_mvd[0] = m_mv[1] +  m_numPartitions;
+    m_mvd[1] = m_mvd[0] + m_numPartitions;
 
+    uint32_t cuSize = g_maxCUSize >> depth;
     uint32_t sizeL = cuSize * cuSize;
     uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift);
     m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
@@ -490,14 +532,13 @@ void CUData::updatePic(uint32_t depth) c
 
 const CUData* CUData::getPULeft(uint32_t& lPartUnitIdx, uint32_t curPartUnitIdx) const
 {
-    uint32_t absPartIdx      = g_zscanToRaster[curPartUnitIdx];
-    uint32_t numPartInCUSize = m_encData->m_numPartInCUSize;
+    uint32_t absPartIdx = g_zscanToRaster[curPartUnitIdx];
 
-    if (!isZeroCol(absPartIdx, numPartInCUSize))
+    if (!isZeroCol(absPartIdx, s_numPartInCUSize))
     {
         uint32_t absZorderCUIdx   = g_zscanToRaster[m_absIdxInCTU];
         lPartUnitIdx = g_rasterToZscan[absPartIdx - 1];
-        if (isEqualCol(absPartIdx, absZorderCUIdx, numPartInCUSize))
+        if (isEqualCol(absPartIdx, absZorderCUIdx, s_numPartInCUSize))
             return m_encData->getPicCTU(m_cuAddr);
         else
         {
@@ -506,20 +547,19 @@ const CUData* CUData::getPULeft(uint32_t
         }
     }
 
-    lPartUnitIdx = g_rasterToZscan[absPartIdx + numPartInCUSize - 1];
+    lPartUnitIdx = g_rasterToZscan[absPartIdx + s_numPartInCUSize - 1];
     return m_cuLeft;
 }
 
 const CUData* CUData::getPUAbove(uint32_t& aPartUnitIdx, uint32_t curPartUnitIdx, bool planarAtCTUBoundary) const
 {
-    uint32_t absPartIdx      = g_zscanToRaster[curPartUnitIdx];
-    uint32_t numPartInCUSize = m_encData->m_numPartInCUSize;
+    uint32_t absPartIdx = g_zscanToRaster[curPartUnitIdx];
 
-    if (!isZeroRow(absPartIdx, numPartInCUSize))
+    if (!isZeroRow(absPartIdx, s_numPartInCUSize))
     {
-        uint32_t absZorderCUIdx   = g_zscanToRaster[m_absIdxInCTU];
-        aPartUnitIdx = g_rasterToZscan[absPartIdx - numPartInCUSize];
-        if (isEqualRow(absPartIdx, absZorderCUIdx, numPartInCUSize))
+        uint32_t absZorderCUIdx = g_zscanToRaster[m_absIdxInCTU];
+        aPartUnitIdx = g_rasterToZscan[absPartIdx - s_numPartInCUSize];
+        if (isEqualRow(absPartIdx, absZorderCUIdx, s_numPartInCUSize))
             return m_encData->getPicCTU(m_cuAddr);
         else
         {
@@ -531,22 +571,21 @@ const CUData* CUData::getPUAbove(uint32_
     if (planarAtCTUBoundary)
         return NULL;
 
-    aPartUnitIdx = g_rasterToZscan[absPartIdx + NUM_CU_PARTITIONS - numPartInCUSize];
+    aPartUnitIdx = g_rasterToZscan[absPartIdx + NUM_CU_PARTITIONS - s_numPartInCUSize];
     return m_cuAbove;
 }
 
 const CUData* CUData::getPUAboveLeft(uint32_t& alPartUnitIdx, uint32_t curPartUnitIdx) const
 {
-    uint32_t absPartIdx      = g_zscanToRaster[curPartUnitIdx];
-    uint32_t numPartInCUSize = m_encData->m_numPartInCUSize;
+    uint32_t absPartIdx = g_zscanToRaster[curPartUnitIdx];
 
-    if (!isZeroCol(absPartIdx, numPartInCUSize))
+    if (!isZeroCol(absPartIdx, s_numPartInCUSize))
     {
-        if (!isZeroRow(absPartIdx, numPartInCUSize))
+        if (!isZeroRow(absPartIdx, s_numPartInCUSize))
         {
             uint32_t absZorderCUIdx  = g_zscanToRaster[m_absIdxInCTU];
-            alPartUnitIdx = g_rasterToZscan[absPartIdx - numPartInCUSize - 1];
-            if (isEqualRowOrCol(absPartIdx, absZorderCUIdx, numPartInCUSize))
+            alPartUnitIdx = g_rasterToZscan[absPartIdx - s_numPartInCUSize - 1];
+            if (isEqualRowOrCol(absPartIdx, absZorderCUIdx, s_numPartInCUSize))
                 return m_encData->getPicCTU(m_cuAddr);
             else
             {
@@ -554,11 +593,11 @@ const CUData* CUData::getPUAboveLeft(uin
                 return this;
             }
         }
-        alPartUnitIdx = g_rasterToZscan[absPartIdx + NUM_CU_PARTITIONS - numPartInCUSize - 1];
+        alPartUnitIdx = g_rasterToZscan[absPartIdx + NUM_CU_PARTITIONS - s_numPartInCUSize - 1];
         return m_cuAbove;
     }
 
-    if (!isZeroRow(absPartIdx, numPartInCUSize))
+    if (!isZeroRow(absPartIdx, s_numPartInCUSize))
     {
         alPartUnitIdx = g_rasterToZscan[absPartIdx - 1];
         return m_cuLeft;
@@ -573,18 +612,17 @@ const CUData* CUData::getPUAboveRight(ui
     if ((m_encData->getPicCTU(m_cuAddr)->m_cuPelX + g_zscanToPelX[curPartUnitIdx] + UNIT_SIZE) >= m_slice->m_sps->picWidthInLumaSamples)
         return NULL;
 
-    uint32_t absPartIdxRT    = g_zscanToRaster[curPartUnitIdx];
-    uint32_t numPartInCUSize = m_encData->m_numPartInCUSize;
+    uint32_t absPartIdxRT = g_zscanToRaster[curPartUnitIdx];
 
-    if (lessThanCol(absPartIdxRT, numPartInCUSize - 1, numPartInCUSize))
+    if (lessThanCol(absPartIdxRT, s_numPartInCUSize - 1, s_numPartInCUSize))
     {
-        if (!isZeroRow(absPartIdxRT, numPartInCUSize))
+        if (!isZeroRow(absPartIdxRT, s_numPartInCUSize))
         {
-            if (curPartUnitIdx > g_rasterToZscan[absPartIdxRT - numPartInCUSize + 1])
+            if (curPartUnitIdx > g_rasterToZscan[absPartIdxRT - s_numPartInCUSize + 1])
             {
                 uint32_t absZorderCUIdx  = g_zscanToRaster[m_absIdxInCTU] + (1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE)) - 1;
-                arPartUnitIdx = g_rasterToZscan[absPartIdxRT - numPartInCUSize + 1];
-                if (isEqualRowOrCol(absPartIdxRT, absZorderCUIdx, numPartInCUSize))
+                arPartUnitIdx = g_rasterToZscan[absPartIdxRT - s_numPartInCUSize + 1];
+                if (isEqualRowOrCol(absPartIdxRT, absZorderCUIdx, s_numPartInCUSize))
                     return m_encData->getPicCTU(m_cuAddr);
                 else
                 {
@@ -594,14 +632,14 @@ const CUData* CUData::getPUAboveRight(ui
             }
             return NULL;
         }
-        arPartUnitIdx = g_rasterToZscan[absPartIdxRT + NUM_CU_PARTITIONS - numPartInCUSize + 1];
+        arPartUnitIdx = g_rasterToZscan[absPartIdxRT + NUM_CU_PARTITIONS - s_numPartInCUSize + 1];
         return m_cuAbove;
     }
 
-    if (!isZeroRow(absPartIdxRT, numPartInCUSize))
+    if (!isZeroRow(absPartIdxRT, s_numPartInCUSize))
         return NULL;
 
-    arPartUnitIdx = g_rasterToZscan[NUM_CU_PARTITIONS - numPartInCUSize];
+    arPartUnitIdx = g_rasterToZscan[NUM_CU_PARTITIONS - s_numPartInCUSize];
     return m_cuAboveRight;
 }
 
@@ -610,18 +648,17 @@ const CUData* CUData::getPUBelowLeft(uin
     if ((m_encData->getPicCTU(m_cuAddr)->m_cuPelY + g_zscanToPelY[curPartUnitIdx] + UNIT_SIZE) >= m_slice->m_sps->picHeightInLumaSamples)
         return NULL;
 
-    uint32_t absPartIdxLB    = g_zscanToRaster[curPartUnitIdx];
-    uint32_t numPartInCUSize = m_encData->m_numPartInCUSize;
+    uint32_t absPartIdxLB = g_zscanToRaster[curPartUnitIdx];
 
-    if (lessThanRow(absPartIdxLB, numPartInCUSize - 1, numPartInCUSize))
+    if (lessThanRow(absPartIdxLB, s_numPartInCUSize - 1, s_numPartInCUSize))
     {
-        if (!isZeroCol(absPartIdxLB, numPartInCUSize))
+        if (!isZeroCol(absPartIdxLB, s_numPartInCUSize))
         {
-            if (curPartUnitIdx > g_rasterToZscan[absPartIdxLB + numPartInCUSize - 1])
+            if (curPartUnitIdx > g_rasterToZscan[absPartIdxLB + s_numPartInCUSize - 1])
             {
-                uint32_t absZorderCUIdxLB = g_zscanToRaster[m_absIdxInCTU] + ((1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE)) - 1) * m_encData->m_numPartInCUSize;
-                blPartUnitIdx = g_rasterToZscan[absPartIdxLB + numPartInCUSize - 1];
-                if (isEqualRowOrCol(absPartIdxLB, absZorderCUIdxLB, numPartInCUSize))
+                uint32_t absZorderCUIdxLB = g_zscanToRaster[m_absIdxInCTU] + ((1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE)) - 1) * s_numPartInCUSize;
+                blPartUnitIdx = g_rasterToZscan[absPartIdxLB + s_numPartInCUSize - 1];