[x265-commits] [x265] TEncCU: set dqpflag as true in the CU encoder if aqmode e...

Mon Mar 17 06:48:01 CET 2014

details:   http://hg.videolan.org/x265/rev/d72b7a5c8176
branches:  
changeset: 6513:d72b7a5c8176
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Mar 13 15:47:59 2014 +0530
description:
TEncCU: set dqpflag as true in the CU encoder if aqmode enabled
Subject: [x265] vbv: set DQP as true if VBV is enabled (and AQ disabled).

details:   http://hg.videolan.org/x265/rev/b82c87d0a896
branches:  
changeset: 6514:b82c87d0a896
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Mar 13 15:55:29 2014 +0530
description:
vbv: set DQP as true if VBV is enabled (and AQ disabled).

Unless this is set, the different QP's for each CU wont be encoded. This worked thankfully
until now, since VBV was always used at high quality (AQ on) settings.
Subject: [x265] encoder: Adding a TODO comment on the final goal.

details:   http://hg.videolan.org/x265/rev/b7e392e2b720
branches:  
changeset: 6515:b7e392e2b720
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Mar 13 15:56:55 2014 +0530
description:
encoder: Adding a TODO comment on the final goal.
Subject: [x265] compress/TEncCU: no reason why mode decision should reset the dqp flags.

details:   http://hg.videolan.org/x265/rev/6c64fbd96968
branches:  
changeset: 6516:6c64fbd96968
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Mar 13 16:24:41 2014 +0530
description:
compress/TEncCU: no reason why mode decision should reset the dqp flags.
Subject: [x265] encode: avoid repetitive statements; no logic change

details:   http://hg.videolan.org/x265/rev/c1ecc3eb288d
branches:  
changeset: 6517:c1ecc3eb288d
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Mar 13 16:50:25 2014 +0530
description:
encode: avoid repetitive statements; no logic change
Subject: [x265] optimize: rewrite TComTrQuant::xGetICRate

details:   http://hg.videolan.org/x265/rev/b6954c4f480f
branches:  
changeset: 6518:b6954c4f480f
user:      Min Chen <chenm003 at 163.com>
date:      Fri Mar 14 18:09:01 2014 -0700
description:
optimize: rewrite TComTrQuant::xGetICRate
Subject: [x265] optimize: rewrite TComTrQuant::xGetICRateCost

details:   http://hg.videolan.org/x265/rev/b8460fba2783
branches:  
changeset: 6519:b8460fba2783
user:      Min Chen <chenm003 at 163.com>
date:      Fri Mar 14 18:09:22 2014 -0700
description:
optimize: rewrite TComTrQuant::xGetICRateCost
Subject: [x265] optimize: improvement TComTrQuant::getSigCtxInc, avoid shift by mask

details:   http://hg.videolan.org/x265/rev/9e9bdc0dd2c5
branches:  
changeset: 6520:9e9bdc0dd2c5
user:      Min Chen <chenm003 at 163.com>
date:      Fri Mar 14 18:09:38 2014 -0700
description:
optimize: improvement TComTrQuant::getSigCtxInc, avoid shift by mask
Subject: [x265] optimize: rewrite TEncSbac::xWriteCoefRemainExGolomb

details:   http://hg.videolan.org/x265/rev/b2617cb09a1a
branches:  
changeset: 6521:b2617cb09a1a
user:      Min Chen <chenm003 at 163.com>
date:      Fri Mar 14 18:10:24 2014 -0700
description:
optimize: rewrite TEncSbac::xWriteCoefRemainExGolomb
Subject: [x265] TComSlice: nits

details:   http://hg.videolan.org/x265/rev/e7e150e4166d
branches:  
changeset: 6522:e7e150e4166d
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 16 22:38:45 2014 -0500
description:
TComSlice: nits
Subject: [x265] prevent deadlocks from frame dependencies on Linux

details:   http://hg.videolan.org/x265/rev/eba8844609f2
branches:  stable
changeset: 6523:eba8844609f2
user:      Steve Borho <steve at borho.org>
date:      Mon Mar 17 00:29:33 2014 -0500
description:
prevent deadlocks from frame dependencies on Linux
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/8d5deb7cafd8
branches:  
changeset: 6524:8d5deb7cafd8
user:      Steve Borho <steve at borho.org>
date:      Mon Mar 17 00:47:24 2014 -0500
description:
Merge with stable

diffstat:

 source/Lib/TLibCommon/TComRom.cpp     |    4 +-
 source/Lib/TLibCommon/TComRom.h       |    4 +-
 source/Lib/TLibCommon/TComSlice.h     |   10 +-
 source/Lib/TLibCommon/TComTrQuant.cpp |  221 ++++++++++++++++++---------------
 source/Lib/TLibCommon/TComTrQuant.h   |   12 +-
 source/Lib/TLibEncoder/TEncCu.cpp     |   27 ++--
 source/Lib/TLibEncoder/TEncSbac.cpp   |   34 +++--
 source/Lib/TLibEncoder/TEncSbac.h     |    2 +-
 source/common/threading.h             |    2 +
 source/encoder/compress.cpp           |    3 +-
 source/encoder/encoder.cpp            |    5 +-
 source/encoder/frameencoder.cpp       |    1 +
 source/encoder/frameencoder.h         |    2 +-
 source/encoder/framefilter.cpp        |    3 +-
 14 files changed, 183 insertions(+), 147 deletions(-)

diffs (truncated from 681 to 300 lines):

diff -r ba3ddc1848ff -r 8d5deb7cafd8 source/Lib/TLibCommon/TComRom.cpp

--- a/source/Lib/TLibCommon/TComRom.cpp	Fri Mar 14 12:56:01 2014 -0500
+++ b/source/Lib/TLibCommon/TComRom.cpp	Mon Mar 17 00:47:24 2014 -0500
@@ -437,9 +437,9 @@ const uint32_t g_minInGroup[10] = { 0, 1
 const uint32_t g_groupIdx[32]   = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 };
 
 // Rice parameters for absolute transform levels
-const uint32_t g_goRiceRange[5] = { 7, 14, 26, 46, 78 };
+const uint8_t g_goRiceRange[5] = { 7, 14, 26, 46, 78 };
 
-const uint32_t g_goRicePrefixLen[5] = { 8, 7, 6, 5, 4 };
+//const uint8_t g_goRicePrefixLen[5] = { 8, 7, 6, 5, 4 };
 
 int g_quantTSDefault4x4[16] =
 {
diff -r ba3ddc1848ff -r 8d5deb7cafd8 source/Lib/TLibCommon/TComRom.h
--- a/source/Lib/TLibCommon/TComRom.h	Fri Mar 14 12:56:01 2014 -0500
+++ b/source/Lib/TLibCommon/TComRom.h	Mon Mar 17 00:47:24 2014 -0500
@@ -131,8 +131,8 @@ extern const int16_t g_chromaFilter[8][N
 extern const uint32_t g_groupIdx[32];
 extern const uint32_t g_minInGroup[10];
 
-extern const uint32_t g_goRiceRange[5];      //!< maximum value coded with Rice codes
-extern const uint32_t g_goRicePrefixLen[5];  //!< prefix length for each maximum value
+extern const uint8_t g_goRiceRange[5];      //!< maximum value coded with Rice codes
+//extern const uint8_t g_goRicePrefixLen[5];  //!< prefix length for each maximum value
 
 // ====================================================================================================================
 // Bit-depth
diff -r ba3ddc1848ff -r 8d5deb7cafd8 source/Lib/TLibCommon/TComSlice.h
--- a/source/Lib/TLibCommon/TComSlice.h	Fri Mar 14 12:56:01 2014 -0500
+++ b/source/Lib/TLibCommon/TComSlice.h	Mon Mar 17 00:47:24 2014 -0500
@@ -1156,13 +1156,13 @@ public:
 
     int       getChromaCrQpOffset() const { return m_chromaCrQpOffset; }
 
-    void      setNumRefIdxL0DefaultActive(uint32_t i)    { m_numRefIdxL0DefaultActive = i; }
+    void      setNumRefIdxL0DefaultActive(uint32_t i) { m_numRefIdxL0DefaultActive = i; }
 
-    uint32_t      getNumRefIdxL0DefaultActive() const     { return m_numRefIdxL0DefaultActive; }
+    uint32_t  getNumRefIdxL0DefaultActive() const     { return m_numRefIdxL0DefaultActive; }
 
-    void      setNumRefIdxL1DefaultActive(uint32_t i)    { m_numRefIdxL1DefaultActive = i; }
+    void      setNumRefIdxL1DefaultActive(uint32_t i) { m_numRefIdxL1DefaultActive = i; }
 
-    uint32_t      getNumRefIdxL1DefaultActive() const     { return m_numRefIdxL1DefaultActive; }
+    uint32_t  getNumRefIdxL1DefaultActive() const     { return m_numRefIdxL1DefaultActive; }
 
     bool getUseWP() const    { return m_bUseWeightPred; }
 
@@ -1202,7 +1202,7 @@ public:
 
     bool     getCabacInitPresentFlag() const        { return m_cabacInitPresentFlag; }
 
-    uint32_t     getEncCABACTableIdx() const            { return m_encCABACTableIdx; }
+    uint32_t getEncCABACTableIdx() const            { return m_encCABACTableIdx; }
 
     void     setDeblockingFilterControlPresentFlag(bool val)  { m_deblockingFilterControlPresentFlag = val; }
 
diff -r ba3ddc1848ff -r 8d5deb7cafd8 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp	Fri Mar 14 12:56:01 2014 -0500
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp	Mon Mar 17 00:47:24 2014 -0500
@@ -60,6 +60,11 @@ typedef struct
 
 #define RDOQ_CHROMA 1  ///< use of RDOQ in chroma
 
+inline static int x265_min_fast(int x, int y)
+{
+    return y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
+}
+
 // ====================================================================================================================
 // TComTrQuant class member functions
 // ====================================================================================================================
@@ -568,7 +573,6 @@ uint32_t TComTrQuant::xRateDistOptQuant(
     uint32_t   c1Idx     = 0;
     uint32_t   c2Idx     = 0;
     int    cgLastScanPos = -1;
-    int    baseLevel;
     uint32_t cgNum = 1 << codingParameters.log2TrSizeCG * 2;
 
     int scanPos;
@@ -609,6 +613,13 @@ uint32_t TComTrQuant::xRateDistOptQuant(
 
             if (lastScanPos >= 0)
             {
+                const uint32_t c1c2Idx = ((c1Idx - 8) >> (sizeof(int) * CHAR_BIT - 1)) + (((-(int)c2Idx) >> (sizeof(int) * CHAR_BIT - 1)) + 1) * 2;
+                const uint32_t baseLevel = ((uint32_t)0xD9 >> (c1c2Idx * 2)) & 3;  // {1, 2, 1, 3}
+                assert(C2FLAG_NUMBER == 1);
+                assert(!!(c1Idx < C1FLAG_NUMBER) == ((c1Idx - 8) >> (sizeof(int) * CHAR_BIT - 1)));
+                assert(!!(c2Idx == 0) == ((-(int)c2Idx) >> (sizeof(int) * CHAR_BIT - 1)) + 1);
+                assert(baseLevel == ((c1Idx < C1FLAG_NUMBER) ? (2 + (c2Idx == 0)) : 1));
+
                 rateIncUp[blkPos] = 0;
                 rateIncDown[blkPos] = 0;
                 deltaU[blkPos] = 0;
@@ -622,23 +633,23 @@ uint32_t TComTrQuant::xRateDistOptQuant(
                 if (scanPos == lastScanPos)
                 {
                     level = xGetCodedLevel(costCoeff[scanPos], costCoeff0[scanPos], costSig[scanPos],
-                                           levelDouble, maxAbsLevel, 0, oneCtx, absCtx, goRiceParam,
-                                           c1Idx, c2Idx, qbits, scaleFactor, 1);
+                                           levelDouble, maxAbsLevel, baseLevel, 0, oneCtx, absCtx, goRiceParam,
+                                           c1c2Idx, qbits, scaleFactor, 1);
                 }
                 else
                 {
-                    uint16_t ctxSig = getSigCtxInc(patternSigCtx, codingParameters, blkPos);
+                    uint16_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, codingParameters);
                     level           = xGetCodedLevel(costCoeff[scanPos], costCoeff0[scanPos], costSig[scanPos],
-                                                     levelDouble, maxAbsLevel, ctxSig, oneCtx, absCtx, goRiceParam,
-                                                     c1Idx, c2Idx, qbits, scaleFactor, 0);
+                                                     levelDouble, maxAbsLevel, baseLevel, ctxSig, oneCtx, absCtx, goRiceParam,
+                                                     c1c2Idx, qbits, scaleFactor, 0);
                     sigRateDelta[blkPos] = m_estBitsSbac->significantBits[ctxSig][1] - m_estBitsSbac->significantBits[ctxSig][0];
                 }
                 deltaU[blkPos] = (levelDouble - ((int)level << qbits)) >> (qbits - 8);
                 if (level > 0)
                 {
-                    int rateNow = xGetICRate(level, oneCtx, absCtx, goRiceParam, c1Idx, c2Idx);
-                    rateIncUp[blkPos] = xGetICRate(level + 1, oneCtx, absCtx, goRiceParam, c1Idx, c2Idx) - rateNow;
-                    rateIncDown[blkPos] = xGetICRate(level - 1, oneCtx, absCtx, goRiceParam, c1Idx, c2Idx) - rateNow;
+                    int rateNow = xGetICRate(level, level - baseLevel, oneCtx, absCtx, goRiceParam, c1c2Idx);
+                    rateIncUp[blkPos] = xGetICRate(level + 1, level + 1 - baseLevel, oneCtx, absCtx, goRiceParam, c1c2Idx) - rateNow;
+                    rateIncDown[blkPos] = xGetICRate(level - 1, level - 1 - baseLevel, oneCtx, absCtx, goRiceParam, c1c2Idx) - rateNow;
                 }
                 else // level == 0
                 {
@@ -647,7 +658,6 @@ uint32_t TComTrQuant::xRateDistOptQuant(
                 dstCoeff[blkPos] = level;
                 baseCost           += costCoeff[scanPos];
 
-                baseLevel = (c1Idx < C1FLAG_NUMBER) ? (2 + (c2Idx < C2FLAG_NUMBER)) : 1;
                 if (level >= baseLevel)
                 {
                     if (goRiceParam < 4 && level > (3 << goRiceParam))
@@ -1004,7 +1014,7 @@ uint32_t TComTrQuant::xRateDistOptQuant(
  * \param height height of the block
  * \returns pattern for current coefficient group
  */
-int TComTrQuant::calcPatternSigCtx(const uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG)
+uint32_t TComTrQuant::calcPatternSigCtx(const uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG)
 {
     if (log2TrSizeCG == 0) return 0;
 
@@ -1027,11 +1037,13 @@ int TComTrQuant::calcPatternSigCtx(const
  * \param textureType texture type (TEXT_LUMA...)
  * \returns ctxInc for current scan position
  */
-int TComTrQuant::getSigCtxInc(int                              patternSigCtx,
-                              const TUEntropyCodingParameters &codingParameters,
-                              const int                        blkPos)
+uint32_t TComTrQuant::getSigCtxInc(const uint32_t                   patternSigCtx,
+                                   const uint32_t                   log2TrSize,
+                                   const uint32_t                   trSize,
+                                   const uint32_t                   blkPos,
+                                   const TUEntropyCodingParameters &codingParameters)
 {
-    static const int ctxIndMap[16] =
+    static const uint8_t ctxIndMap[16] =
     {
         0, 1, 4, 5,
         2, 3, 4, 5,
@@ -1041,16 +1053,17 @@ int TComTrQuant::getSigCtxInc(int       
 
     if (blkPos == 0) return 0; //special case for the DC context variable
 
-    const int log2TrSize = codingParameters.log2TrSize;
     if (log2TrSize == 2) //4x4
     {
         return ctxIndMap[blkPos];
     }
 
     const uint32_t posY           = blkPos >> log2TrSize;
-    const uint32_t posX           = blkPos - (posY << log2TrSize);
+    const uint32_t posX           = blkPos & (trSize - 1);
+    assert((blkPos - (posY << log2TrSize)) == posX);
 
-    int posXinSubset = posX & 3;
+    int posXinSubset = blkPos & 3;
+    assert((posX & 3) == (blkPos & 3));
     int posYinSubset = posY & 3;
 
     // NOTE: [patternSigCtx][posXinSubset][posYinSubset]
@@ -1115,12 +1128,12 @@ inline uint32_t TComTrQuant::xGetCodedLe
                                             double&  codedCostSig,
                                             int      levelDouble,
                                             uint32_t maxAbsLevel,
-                                            uint16_t ctxNumSig,
-                                            uint16_t ctxNumOne,
-                                            uint16_t ctxNumAbs,
-                                            uint16_t absGoRice,
-                                            uint32_t c1Idx,
-                                            uint32_t c2Idx,
+                                            uint32_t baseLevel,
+                                            uint32_t ctxNumSig,
+                                            uint32_t ctxNumOne,
+                                            uint32_t ctxNumAbs,
+                                            uint32_t absGoRice,
+                                            uint32_t c1c2Idx,
                                             int      qbits,
                                             double   scaleFactor,
                                             bool     last) const
@@ -1151,7 +1164,7 @@ inline uint32_t TComTrQuant::xGetCodedLe
     for (int absLevel = maxAbsLevel; absLevel >= minAbsLevel; absLevel--)
     {
         double err     = double(levelDouble  - (absLevel << qbits));
-        double curCost = err * err * scaleFactor + xGetICRateCost(absLevel, ctxNumOne, ctxNumAbs, absGoRice, c1Idx, c2Idx);
+        double curCost = err * err * scaleFactor + xGetICRateCost(absLevel, absLevel - baseLevel, ctxNumOne, ctxNumAbs, absGoRice, c1c2Idx);
         curCost       += curCostSig;
 
         if (curCost < codedCost)
@@ -1173,121 +1186,133 @@ inline uint32_t TComTrQuant::xGetCodedLe
  * \returns cost of given absolute transform level
  */
 inline double TComTrQuant::xGetICRateCost(uint32_t absLevel,
-                                          uint16_t ctxNumOne,
-                                          uint16_t ctxNumAbs,
-                                          uint16_t absGoRice,
-                                          uint32_t c1Idx,
-                                          uint32_t c2Idx) const
+                                          int32_t  diffLevel,
+                                          uint32_t ctxNumOne,
+                                          uint32_t ctxNumAbs,
+                                          uint32_t absGoRice,
+                                          uint32_t c1c2Idx) const
 {
-    double rate = xGetIEPRate();
-    uint32_t baseLevel = (c1Idx < C1FLAG_NUMBER) ? (2 + (c2Idx < C2FLAG_NUMBER)) : 1;
+    assert(absLevel > 0);
+    uint32_t rate = xGetIEPRate();
+    const int *greaterOneBits = m_estBitsSbac->greaterOneBits[ctxNumOne];
+    const int *levelAbsBits = m_estBitsSbac->levelAbsBits[ctxNumAbs];
 
-    if (absLevel >= baseLevel)
+    if (diffLevel < 0)
     {
-        uint32_t symbol = absLevel - baseLevel;
+        assert((absLevel == 1) || (absLevel == 2));
+        rate += greaterOneBits[(absLevel == 2)];
+
+        if (absLevel == 2)
+        {
+            rate += levelAbsBits[0];
+        }
+    }
+    else
+    {
+        uint32_t symbol = diffLevel;
         uint32_t length;
-        if (symbol < (COEF_REMAIN_BIN_REDUCTION << absGoRice))
+        if ((symbol >> absGoRice) < COEF_REMAIN_BIN_REDUCTION)
         {
             length = symbol >> absGoRice;
             rate += (length + 1 + absGoRice) << 15;
         }
         else
         {
-            length = absGoRice;
-            symbol  = symbol - (COEF_REMAIN_BIN_REDUCTION << absGoRice);
-            while (symbol >= (1 << length))
+            length = 0;
+            symbol = (symbol >> absGoRice) - COEF_REMAIN_BIN_REDUCTION;
+            if (symbol != 0)
             {
-                symbol -=  (1 << (length++));
+                unsigned long idx;
+                CLZ32(idx, symbol + 1);
+                length = idx;
             }
 
-            rate += (COEF_REMAIN_BIN_REDUCTION + length + 1 - absGoRice + length) << 15;
+            rate += (COEF_REMAIN_BIN_REDUCTION + length + absGoRice + 1 + length) << 15;
         }
-        if (c1Idx < C1FLAG_NUMBER)
+        if (c1c2Idx & 1)
         {
-            rate += m_estBitsSbac->greaterOneBits[ctxNumOne][1];
+            rate += greaterOneBits[1];
+        }
 
-            if (c2Idx < C2FLAG_NUMBER)
-            {
-                rate += m_estBitsSbac->levelAbsBits[ctxNumAbs][1];
-            }
+        if (c1c2Idx == 3)
+        {
+            rate += levelAbsBits[1];
         }
     }
-    else if (absLevel == 1)
-    {
-        rate += m_estBitsSbac->greaterOneBits[ctxNumOne][0];
-    }
-    else if (absLevel == 2)
-    {
-        rate += m_estBitsSbac->greaterOneBits[ctxNumOne][1];
-        rate += m_estBitsSbac->levelAbsBits[ctxNumAbs][0];
-    }
-    else