[x265-commits] [x265] cli: display param->bSaoNonDeblocked as bool in CLI help

Steve Borho steve at borho.org
Thu Oct 2 06:31:50 CEST 2014


details:   http://hg.videolan.org/x265/rev/1af64c8c2d28
branches:  
changeset: 8182:1af64c8c2d28
user:      Steve Borho <steve at borho.org>
date:      Tue Sep 30 18:20:10 2014 -0500
description:
cli: display param->bSaoNonDeblocked as bool in CLI help
Subject: [x265] analysis: remove default arguments to checkInter_rd5_6 and checkInter_rd0_4

details:   http://hg.videolan.org/x265/rev/4d9ff684c80f
branches:  
changeset: 8183:4d9ff684c80f
user:      Steve Borho <steve at borho.org>
date:      Tue Sep 30 18:43:08 2014 -0500
description:
analysis: remove default arguments to checkInter_rd5_6 and checkInter_rd0_4
Subject: [x265] stub in framework for parallel mode analysis and parallel ME

details:   http://hg.videolan.org/x265/rev/5c1a4804c42d
branches:  
changeset: 8184:5c1a4804c42d
user:      Steve Borho <steve at borho.org>
date:      Sat Sep 27 12:25:21 2014 -0500
description:
stub in framework for parallel mode analysis and parallel ME
Subject: [x265] analysis: move non-distributed path into else clause

details:   http://hg.videolan.org/x265/rev/b17ddb5d71f4
branches:  
changeset: 8185:b17ddb5d71f4
user:      Steve Borho <steve at borho.org>
date:      Tue Sep 30 18:50:25 2014 -0500
description:
analysis: move non-distributed path into else clause

this is done in a second patch since it touches a lot of code trivially so it
Subject: [x265] analysis: fixup

details:   http://hg.videolan.org/x265/rev/3bd852b225b5
branches:  
changeset: 8186:3bd852b225b5
user:      Steve Borho <steve at borho.org>
date:      Tue Sep 30 22:20:32 2014 -0500
description:
analysis: fixup
Subject: [x265] slice: better structure packing

details:   http://hg.videolan.org/x265/rev/d0fa09e9cca5
branches:  
changeset: 8187:d0fa09e9cca5
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Oct 01 09:39:36 2014 +0530
description:
slice: better structure packing
Subject: [x265] ratecontrol: replace an imprecise comparison with a more precise check to ensure

details:   http://hg.videolan.org/x265/rev/f9922ce58a20
branches:  
changeset: 8188:f9922ce58a20
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Oct 01 23:09:55 2014 +0530
description:
ratecontrol: replace an imprecise comparison with a more precise check to ensure
consistency.
Subject: [x265] ratecontrol: fix float absolute check

details:   http://hg.videolan.org/x265/rev/2a55baeb89cf
branches:  
changeset: 8189:2a55baeb89cf
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Oct 01 23:18:57 2014 +0530
description:
ratecontrol: fix float absolute check
Subject: [x265] analysis: remove PartSize argument to checkIntraInInter_rd0_4

details:   http://hg.videolan.org/x265/rev/0c6fe4a39a32
branches:  
changeset: 8190:0c6fe4a39a32
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 14:13:18 2014 -0500
description:
analysis: remove PartSize argument to checkIntraInInter_rd0_4
Subject: [x265] analysis: remove bMergeOnly argument to checkInter_rd0_4, always false

details:   http://hg.videolan.org/x265/rev/589d4d7e5a72
branches:  
changeset: 8191:589d4d7e5a72
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 14:48:14 2014 -0500
description:
analysis: remove bMergeOnly argument to checkInter_rd0_4, always false
Subject: [x265] analysis: initialize job counters

details:   http://hg.videolan.org/x265/rev/7daea9e6c5ae
branches:  
changeset: 8192:7daea9e6c5ae
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 13:57:38 2014 -0500
description:
analysis: initialize job counters
Subject: [x265] analysis: use source buffer for source stride

details:   http://hg.videolan.org/x265/rev/61e028b5a04e
branches:  
changeset: 8193:61e028b5a04e
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 14:13:59 2014 -0500
description:
analysis: use source buffer for source stride

it was always a coincidence that the output stride matched
Subject: [x265] analysis: nit, remove obviously wrong comment

details:   http://hg.videolan.org/x265/rev/bd3046c4bb36
branches:  
changeset: 8194:bd3046c4bb36
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 14:14:19 2014 -0500
description:
analysis: nit, remove obviously wrong comment
Subject: [x265] analysis: further work on parallel ME

details:   http://hg.videolan.org/x265/rev/c0bbd8d01257
branches:  
changeset: 8195:c0bbd8d01257
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 00:49:02 2014 -0500
description:
analysis: further work on parallel ME
Subject: [x265] fix bug in 73c6c9086577 for rdLevel=0

details:   http://hg.videolan.org/x265/rev/b57c63127527
branches:  
changeset: 8196:b57c63127527
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Thu Oct 02 09:25:40 2014 +0900
description:
fix bug in 73c6c9086577 for rdLevel=0
Subject: [x265] rc : correct max AU size for first frame

details:   http://hg.videolan.org/x265/rev/4579fc590099
branches:  
changeset: 8197:4579fc590099
user:      Aarthi Thirumalai
date:      Thu Oct 02 00:14:13 2014 +0530
description:
rc : correct max AU size for first frame
Subject: [x265] rc: correct the threshold for resetABR function

details:   http://hg.videolan.org/x265/rev/0212e9832ce7
branches:  
changeset: 8198:0212e9832ce7
user:      Aarthi Thirumalai
date:      Thu Oct 02 00:46:10 2014 +0530
description:
rc: correct the threshold for resetABR function
Subject: [x265] analysis: remove unused variables, fixes warnings

details:   http://hg.videolan.org/x265/rev/898a2546aff1
branches:  
changeset: 8199:898a2546aff1
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 01 23:26:28 2014 -0500
description:
analysis: remove unused variables, fixes warnings

diffstat:

 source/Lib/TLibCommon/TComDataCU.cpp |   39 +-
 source/Lib/TLibCommon/TComDataCU.h   |    2 +-
 source/common/slice.h                |    6 +-
 source/encoder/analysis.cpp          |  577 ++++++++++++++++++++++++++--------
 source/encoder/analysis.h            |   47 ++-
 source/encoder/encoder.cpp           |    3 +-
 source/encoder/entropy.cpp           |   47 +-
 source/encoder/entropy.h             |    2 +-
 source/encoder/frameencoder.h        |    7 -
 source/encoder/ratecontrol.cpp       |    7 +-
 source/x265.cpp                      |    2 +-
 11 files changed, 526 insertions(+), 213 deletions(-)

diffs (truncated from 1220 to 300 lines):

diff -r a4859c266a59 -r 898a2546aff1 source/Lib/TLibCommon/TComDataCU.cpp
--- a/source/Lib/TLibCommon/TComDataCU.cpp	Tue Sep 30 18:05:46 2014 -0500
+++ b/source/Lib/TLibCommon/TComDataCU.cpp	Wed Oct 01 23:26:28 2014 -0500
@@ -454,19 +454,15 @@ void TComDataCU::initSubCU(TComDataCU* c
     m_cuAboveRight  = cu->getCUAboveRight();
 }
 
-void TComDataCU::copyToSubCU(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth)
+void TComDataCU::copyFromPic(TComDataCU* ctu, CU* cuData)
 {
-    X265_CHECK(partUnitIdx < 4, "part unit should be less than 4\n");
+    m_pic              = ctu->m_pic;
+    m_slice            = ctu->m_slice;
+    m_cuAddr           = ctu->getAddr();
+    m_absIdxInCTU      = cuData->encodeIdx;
 
-    uint32_t partOffset = cuData->numPartitions * partUnitIdx;
-
-    m_pic              = cu->m_pic;
-    m_slice            = cu->m_slice;
-    m_cuAddr           = cu->getAddr();
-    m_absIdxInCTU      = cuData->encodeIdx + partOffset;
-
-    m_cuPelX           = cu->getCUPelX() + ((partUnitIdx &  1) << (g_maxLog2CUSize - depth));
-    m_cuPelY           = cu->getCUPelY() + ((partUnitIdx >> 1) << (g_maxLog2CUSize - depth));
+    m_cuPelX           = ctu->getCUPelX() + g_zscanToPelX[m_absIdxInCTU];
+    m_cuPelY           = ctu->getCUPelY() + g_zscanToPelY[m_absIdxInCTU];
 
     m_psyEnergy        = 0;
     m_totalPsyCost     = MAX_INT64;
@@ -478,18 +474,17 @@ void TComDataCU::copyToSubCU(TComDataCU*
     m_coeffBits        = 0;
     m_numPartitions    = cuData->numPartitions;
 
-    TComDataCU* otherCU = m_pic->getCU(m_cuAddr);
     int sizeInChar  = sizeof(char) * m_numPartitions;
 
-    memcpy(m_skipFlag, otherCU->getSkipFlag() + m_absIdxInCTU, sizeof(*m_skipFlag) * m_numPartitions);
-    memcpy(m_qp, otherCU->getQP() + m_absIdxInCTU, sizeInChar);
+    memcpy(m_skipFlag, ctu->getSkipFlag() + m_absIdxInCTU, sizeof(*m_skipFlag) * m_numPartitions);
+    memcpy(m_qp, ctu->getQP() + m_absIdxInCTU, sizeInChar);
 
-    memcpy(m_partSizes, otherCU->getPartitionSize() + m_absIdxInCTU, sizeof(*m_partSizes) * m_numPartitions);
-    memcpy(m_predModes, otherCU->getPredictionMode() + m_absIdxInCTU, sizeof(*m_predModes) * m_numPartitions);
+    memcpy(m_partSizes, ctu->getPartitionSize() + m_absIdxInCTU, sizeof(*m_partSizes) * m_numPartitions);
+    memcpy(m_predModes, ctu->getPredictionMode() + m_absIdxInCTU, sizeof(*m_predModes) * m_numPartitions);
 
-    memcpy(m_lumaIntraDir, otherCU->getLumaIntraDir() + m_absIdxInCTU, sizeInChar);
-    memcpy(m_depth, otherCU->getDepth() + m_absIdxInCTU, sizeInChar);
-    memcpy(m_log2CUSize, otherCU->getLog2CUSize() + m_absIdxInCTU, sizeInChar);
+    memcpy(m_lumaIntraDir, ctu->getLumaIntraDir() + m_absIdxInCTU, sizeInChar);
+    memcpy(m_depth, ctu->getDepth() + m_absIdxInCTU, sizeInChar);
+    memcpy(m_log2CUSize, ctu->getLog2CUSize() + m_absIdxInCTU, sizeInChar);
 }
 
 // --------------------------------------------------------------------------------------------------------------------
@@ -2411,6 +2406,8 @@ void TComDataCU::getTUEntropyCodingParam
 void TComDataCU::loadCTUData(uint32_t maxCUSize)
 {
     // Initialize the coding blocks inside the CTB
+    int picWidth  = m_pic->m_origPicYuv->m_picWidth;
+    int picHeight = m_pic->m_origPicYuv->m_picHeight;
     for (uint32_t log2CUSize = g_log2Size[maxCUSize], rangeCUIdx = 0; log2CUSize >= MIN_LOG2_CU_SIZE; log2CUSize--)
     {
         uint32_t blockSize  = 1 << log2CUSize;
@@ -2425,8 +2422,8 @@ void TComDataCU::loadCTUData(uint32_t ma
                 uint32_t child_idx = rangeCUIdx + sbWidth * sbWidth + (depth_idx << 2);
                 uint32_t px = m_cuPelX + sb_x * blockSize;
                 uint32_t py = m_cuPelY + sb_y * blockSize;
-                int32_t present_flag = px < m_pic->m_origPicYuv->m_picWidth && py < m_pic->m_origPicYuv->m_picHeight;
-                int32_t split_mandatory_flag = present_flag && !last_level_flag && (px + blockSize > m_pic->m_origPicYuv->m_picWidth || py + blockSize > m_pic->m_origPicYuv->m_picHeight);
+                int32_t present_flag = px < picWidth && py < picHeight;
+                int32_t split_mandatory_flag = present_flag && !last_level_flag && (px + blockSize > picWidth || py + blockSize > picHeight);
                 
                 /* Offset of the luma CU in the X, Y direction in terms of pixels from the CTU origin */
                 uint32_t xOffset = (sb_x * blockSize) >> 3;
diff -r a4859c266a59 -r 898a2546aff1 source/Lib/TLibCommon/TComDataCU.h
--- a/source/Lib/TLibCommon/TComDataCU.h	Tue Sep 30 18:05:46 2014 -0500
+++ b/source/Lib/TLibCommon/TComDataCU.h	Wed Oct 01 23:26:28 2014 -0500
@@ -276,7 +276,7 @@ public:
     void          initSubCU(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth, int qp);
     void          loadCTUData(uint32_t maxCUSize);
 
-    void          copyToSubCU(TComDataCU* ctu, CU* cuData, uint32_t partUnitIdx, uint32_t depth);
+    void          copyFromPic(TComDataCU* ctu, CU* cuData);
     void          copyPartFrom(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth, bool isRDObasedAnalysis = true);
 
     void          copyToPic(uint32_t depth);
diff -r a4859c266a59 -r 898a2546aff1 source/common/slice.h
--- a/source/common/slice.h	Tue Sep 30 18:05:46 2014 -0500
+++ b/source/common/slice.h	Wed Oct 01 23:26:28 2014 -0500
@@ -95,13 +95,13 @@ namespace Level {
 struct ProfileTierLevel
 {
     bool     tierFlag;
-    int      profileIdc;
-    bool     profileCompatibilityFlag[32];
-    int      levelIdc;
     bool     progressiveSourceFlag;
     bool     interlacedSourceFlag;
     bool     nonPackedConstraintFlag;
     bool     frameOnlyConstraintFlag;
+    bool     profileCompatibilityFlag[32];
+    int      profileIdc;
+    int      levelIdc;
     uint32_t minCrForLevel;
     uint32_t maxLumaSrForLevel;
 };
diff -r a4859c266a59 -r 898a2546aff1 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp	Tue Sep 30 18:05:46 2014 -0500
+++ b/source/encoder/analysis.cpp	Wed Oct 01 23:26:28 2014 -0500
@@ -21,16 +21,19 @@
 * For more information, contact us at license @ x265.com.
 *****************************************************************************/
 
+#include "common.h"
+#include "primitives.h"
+#include "threading.h"
+
 #include "analysis.h"
-#include "primitives.h"
-#include "common.h"
 #include "rdcost.h"
 #include "encoder.h"
+
 #include "PPA/ppa.h"
 
 using namespace x265;
 
-Analysis::Analysis()
+Analysis::Analysis() : JobProvider(NULL)
 {
     m_bestPredYuv     = NULL;
     m_bestResiYuv     = NULL;
@@ -43,12 +46,16 @@ Analysis::Analysis()
     m_origYuv         = NULL;
     for (int i = 0; i < MAX_PRED_TYPES; i++)
         m_modePredYuv[i] = NULL;
+    m_bJobsQueued = false;
+    m_totalNumME = m_numAcquiredME = m_numCompletedME = 0;
+    m_totalNumJobs = m_numAcquiredJobs = m_numCompletedJobs = 0;
 }
 
-bool Analysis::create(uint32_t numCUDepth, uint32_t maxWidth)
+bool Analysis::create(uint32_t numCUDepth, uint32_t maxWidth, ThreadLocalData *tld)
 {
     X265_CHECK(numCUDepth <= NUM_CU_DEPTH, "invalid numCUDepth\n");
 
+    m_tld = tld;
     m_bestPredYuv = new TComYuv*[numCUDepth];
     m_bestResiYuv = new ShortYuv*[numCUDepth];
     m_bestRecoYuv = new TComYuv*[numCUDepth];
@@ -227,6 +234,124 @@ void Analysis::destroy()
     delete [] m_origYuv;
 }
 
+bool Analysis::findJob(int threadId)
+{
+    /* try to acquire a CU mode to analyze */
+    if (m_totalNumJobs > m_numAcquiredJobs)
+    {
+        /* ATOMIC_INC returns the incremented value */
+        int id = ATOMIC_INC(&m_numAcquiredJobs);
+        if (m_totalNumJobs >= id)
+        {
+            parallelAnalysisJob(threadId, id - 1);
+            if (ATOMIC_INC(&m_numCompletedJobs) == m_totalNumJobs)
+                m_modeCompletionEvent.trigger();
+            return true;
+        }
+    }
+
+    /* else try to acquire a motion estimation task */
+    if (m_totalNumME > m_numAcquiredME)
+    {
+        int id = ATOMIC_INC(&m_numAcquiredME);
+        if (m_totalNumME >= id)
+        {
+            parallelME(threadId, id - 1);
+            if (ATOMIC_INC(&m_numCompletedME) == m_totalNumME)
+                m_meCompletionEvent.trigger();
+            return true;
+        }
+    }
+
+    return false;
+}
+
+void Analysis::parallelAnalysisJob(int threadId, int jobId)
+{
+    Analysis* slave;
+    int depth = m_curDepth;
+
+    if (threadId == -1)
+        slave = this;
+    else
+    {
+        TComDataCU *cu = m_interCU_2Nx2N[depth];
+        TComPicYuv* fenc = cu->m_pic->getPicYuvOrg();
+
+        slave = &m_tld[threadId].analysis;
+        slave->m_me.setSourcePlane(fenc->getLumaAddr(), fenc->getStride());
+        slave->m_log = &slave->m_sliceTypeLog[cu->m_slice->m_sliceType];
+        slave->m_rdEntropyCoders = this->m_rdEntropyCoders;
+        m_origYuv[0]->copyPartToYuv(slave->m_origYuv[depth], m_curCUData->encodeIdx);
+        slave->setQP(cu->m_slice, m_rdCost.m_qp);
+        slave->m_quant.setQPforQuant(cu);
+        slave->m_quant.m_nr = m_quant.m_nr;
+    }
+
+    if (m_param->rdLevel <= 4)
+    {
+        switch (jobId)
+        {
+        case 0:
+            slave->checkIntraInInter_rd0_4(m_intraInInterCU[depth], m_curCUData);
+            break;
+
+        case 1:
+            slave->checkInter_rd0_4(m_interCU_2Nx2N[depth], m_curCUData, m_modePredYuv[0][depth], SIZE_2Nx2N);
+            break;
+
+        case 2:
+            slave->checkInter_rd0_4(m_interCU_Nx2N[depth], m_curCUData, m_modePredYuv[1][depth], SIZE_Nx2N);
+            break;
+
+        case 3:
+            slave->checkInter_rd0_4(m_interCU_2NxN[depth], m_curCUData, m_modePredYuv[2][depth], SIZE_2NxN);
+            break;
+
+        default:
+            X265_CHECK(0, "invalid job ID for parallel mode analysis\n");
+            break;
+        }
+    }
+}
+
+void Analysis::parallelME(int threadId, int meId)
+{
+    Analysis* slave;
+    int depth = m_curDepth;
+    TComDataCU *cu = m_curMECu;
+    TComPicYuv* fenc = cu->m_pic->getPicYuvOrg();
+
+    if (threadId == -1)
+        slave = this;
+    else
+    {
+        slave = &m_tld[threadId].analysis;
+
+        slave->m_me.setSourcePlane(fenc->getLumaAddr(), fenc->getStride());
+        m_origYuv[0]->copyPartToYuv(slave->m_origYuv[depth], m_curCUData->encodeIdx);
+        slave->setQP(cu->m_slice, m_rdCost.m_qp);
+    }
+
+    uint32_t partAddr;
+    int      roiWidth, roiHeight;
+    cu->getPartIndexAndSize(m_curPart, partAddr, roiWidth, roiHeight);
+
+    pixel* pu = fenc->getLumaAddr(cu->getAddr(), m_curCUData->encodeIdx + partAddr);
+    m_me.setSourcePU(pu - fenc->getLumaAddr(), roiWidth, roiHeight);
+
+    if (meId < cu->m_slice->m_numRefIdx[0])
+    {
+        /* perform list 0 motion search */
+    }
+    else
+    {
+        /* perform list 1 motion search */
+    }
+
+    /* TODO: acquire master output lock, if best cost save the info */
+}
+
 void Analysis::compressCU(TComDataCU* cu)
 {
     Frame* pic = cu->m_pic;
@@ -750,41 +875,83 @@ void Analysis::compressInterCU_rd0_4(TCo
                 m_bestMergeCU[depth]->initCU(pic, cuAddr);
             }
 
-            /* Compute Merge Cost */
-            checkMerge2Nx2N_rd0_4(m_bestMergeCU[depth], m_mergeCU[depth], cu, m_modePredYuv[3][depth], m_bestMergeRecoYuv[depth]);
-            bool earlyskip = false;
-            if (m_param->rdLevel >= 1)
-                earlyskip = (m_param->bEnableEarlySkip && m_bestMergeCU[depth]->isSkipped(0));
+            const int distributedAnalysis = 0 && m_param->rdLevel > 2; /* only RD 3, 4 supported here */
 
-            if (!earlyskip)
+            if (distributedAnalysis)
             {
-                /* Compute 2Nx2N mode costs */
-                checkInter_rd0_4(m_interCU_2Nx2N[depth], cu, m_modePredYuv[0][depth], SIZE_2Nx2N);
+                /* with distributed analysis, we perform more speculative work.
+                 * We do not have early outs for when skips are found so we
+                 * always evaluate intra and all inter and merge modes
+                 *
+                 * jobs are numbered as:
+                 *  0 = intra
+                 *  1 = inter 2Nx2N


More information about the x265-commits mailing list