[x265-commits] [x265] entropy: simplify sign hide flag

Sun Aug 10 03:23:14 CEST 2014

details:   http://hg.videolan.org/x265/rev/84acc8eb8d9c
branches:  
changeset: 7748:84acc8eb8d9c
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 08 14:20:34 2014 -0500
description:
entropy: simplify sign hide flag
Subject: [x265] quant: improve variable names and comments (no behavior change)

details:   http://hg.videolan.org/x265/rev/d6723db1e8ec
branches:  
changeset: 7749:d6723db1e8ec
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 08 14:19:52 2014 -0500
description:
quant: improve variable names and comments (no behavior change)
Subject: [x265] quant: avoid an extra shift by adjusting the unquant coeff shift

details:   http://hg.videolan.org/x265/rev/4003cbf60782
branches:  
changeset: 7750:4003cbf60782
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 08 23:17:38 2014 -0500
description:
quant: avoid an extra shift by adjusting the unquant coeff shift
Subject: [x265] quant: reduce conditional expression depths (mostly for readability)

details:   http://hg.videolan.org/x265/rev/95b1d7535af8
branches:  
changeset: 7751:95b1d7535af8
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 00:01:46 2014 -0500
description:
quant: reduce conditional expression depths (mostly for readability)
Subject: [x265] quant: do not check CG bitmap for implied-present coeff groups

details:   http://hg.videolan.org/x265/rev/220e217152cf
branches:  
changeset: 7752:220e217152cf
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 00:08:38 2014 -0500
description:
quant: do not check CG bitmap for implied-present coeff groups
Subject: [x265] quant: use standard rd cost formula for sign-hiding [CHANGES OUTPUTS]

details:   http://hg.videolan.org/x265/rev/e18b85eeb6c5
branches:  
changeset: 7753:e18b85eeb6c5
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 01:00:44 2014 -0500
description:
quant: use standard rd cost formula for sign-hiding [CHANGES OUTPUTS]

The previous RD formula was simply inexplicable, though it did work fairly well.

Old approach:
deltaU[blkPos] = (scaledCoeff[blkPos] - ((int)level << qbits)) >> (qbits - 8);
int64_t invQuant = ScalingList::s_invQuantScales[rem] << per;
int64_t rdFactor = (int64_t)((invQuant * invQuant) / (lambda2 * 16) + 0.5);
costUp = rdFactor * (-deltaU[blkPos]) + rateIncUp[blkPos];
 - wat? -

New approach:
int d = abs(signCoef) - UNQUANT(absLevel + 1);
costUp = (((uint64_t)(d * d)) << scaleBits) + lambda2 * rateIncUp[blkPos];

Using this approach the results are nearly the same (they appear to be slightly
better) but now we can probably add psycho-visual tunings to the sign hiding
feature
Subject: [x265] quant: header cleanups, no functional change

details:   http://hg.videolan.org/x265/rev/33c6c661905c
branches:  
changeset: 7754:33c6c661905c
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 14:27:52 2014 -0500
description:
quant: header cleanups, no functional change
Subject: [x265] quant: cleanup chroma QP function

details:   http://hg.videolan.org/x265/rev/5132c37cdb38
branches:  
changeset: 7755:5132c37cdb38
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 14:33:24 2014 -0500
description:
quant: cleanup chroma QP function

With a unique function name, protected access, and only called from one
location, the ttype check could be removed.
Subject: [x265] quant: remove floating point operations from RDOQ [CHANGES OUTPUTS]

details:   http://hg.videolan.org/x265/rev/4f1ce079b4a4
branches:  
changeset: 7756:4f1ce079b4a4
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 14:21:29 2014 -0500
description:
quant: remove floating point operations from RDOQ [CHANGES OUTPUTS]

The output changes are minor. On modern CPUs the performance benefit of this
change is negligable since SSE double operations are similar in performance to
int64 operations. As a future optimization, we need to figure out how to
multiply lambda2 (FIX8 24bits) by signal cost (FIX15 24bits) using 32-bit
integers since 32bit multiply is significantly cheaper than 64bit integer
multiply.

Similarly, unquantAbsLevel can be larger than 16bits so multiplation is done
with int64. Note that we use signed int64 because with psy-rdoq the costs
could go negative.
Subject: [x265] quant: comment improvements

details:   http://hg.videolan.org/x265/rev/c9dd47a21b48
branches:  
changeset: 7757:c9dd47a21b48
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 18:30:54 2014 -0500
description:
quant: comment improvements
Subject: [x265] quant: improve flow and comments for last non-zero refinement

details:   http://hg.videolan.org/x265/rev/6e4eb8542203
branches:  
changeset: 7758:6e4eb8542203
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 09 19:43:23 2014 -0500
description:
quant: improve flow and comments for last non-zero refinement

diffstat:

 source/common/quant.cpp    |  430 ++++++++++++++++++++++----------------------
 source/common/quant.h      |   65 +++---
 source/encoder/entropy.cpp |    9 +-
 3 files changed, 250 insertions(+), 254 deletions(-)

diffs (truncated from 897 to 300 lines):

diff -r 091a63164c41 -r 6e4eb8542203 source/common/quant.cpp

--- a/source/common/quant.cpp	Thu Aug 07 18:18:11 2014 -0500
+++ b/source/common/quant.cpp	Sat Aug 09 19:43:23 2014 -0500
@@ -37,11 +37,11 @@ namespace {
 
 struct coeffGroupRDStats
 {
-    int    nnzBeforePos0;
-    double codedLevelAndDist; // distortion and level cost only
-    double uncodedDist;       // all zero coded block distortion
-    double sigCost;
-    double sigCost0;
+    int     nnzBeforePos0;     /* indicates coeff other than pos 0 are coded */
+    int64_t codedLevelAndDist; /* distortion and level cost of coded coefficients */
+    int64_t uncodedDist;       /* uncoded distortion cost of coded coefficients */ 
+    int64_t sigCost;           /* cost of signaling significant coeff bitmap */
+    int64_t sigCost0;          /* cost of signaling sig coeff bit of coeff 0 */
 };
 
 inline int fastMin(int x, int y)
@@ -173,7 +173,7 @@ Quant::Quant()
 bool Quant::init(bool useRDOQ, double psyScale, const ScalingList& scalingList)
 {
     m_useRDOQ = useRDOQ;
-    m_psyRdoqScale = (uint64_t)(psyScale * 256.0);
+    m_psyRdoqScale = (int64_t)(psyScale * 256.0);
     m_scalingList = &scalingList;
     m_resiDctCoeff = X265_MALLOC(coeff_t, MAX_TR_SIZE * MAX_TR_SIZE * 2);
     m_fencDctCoeff = m_resiDctCoeff + (MAX_TR_SIZE * MAX_TR_SIZE);
@@ -194,15 +194,13 @@ void Quant::setQPforQuant(TComDataCU* cu
     int chFmt = cu->getChromaFormat();
 
     m_qpParam[TEXT_LUMA].setQpParam(qpy + QP_BD_OFFSET);
-    setQPforQuant(qpy, TEXT_CHROMA_U, cu->m_slice->m_pps->chromaCbQpOffset, chFmt);
-    setQPforQuant(qpy, TEXT_CHROMA_V, cu->m_slice->m_pps->chromaCrQpOffset, chFmt);
+    setChromaQP(qpy + cu->m_slice->m_pps->chromaCbQpOffset, TEXT_CHROMA_U, chFmt);
+    setChromaQP(qpy + cu->m_slice->m_pps->chromaCrQpOffset, TEXT_CHROMA_V, chFmt);
 }
 
-void Quant::setQPforQuant(int qpy, TextType ttype, int chromaQPOffset, int chFmt)
+void Quant::setChromaQP(int qpin, TextType ttype, int chFmt)
 {
-    X265_CHECK(ttype == TEXT_CHROMA_U || ttype == TEXT_CHROMA_V, "invalid ttype\n");
-
-    int qp = Clip3(-QP_BD_OFFSET, 57, qpy + chromaQPOffset);
+    int qp = Clip3(-QP_BD_OFFSET, 57, qpin);
     if (qp >= 30)
     {
         if (chFmt == X265_CSP_I420)
@@ -213,11 +211,18 @@ void Quant::setQPforQuant(int qpy, TextT
     m_qpParam[ttype].setQpParam(qp + QP_BD_OFFSET);
 }
 
+void Quant::setLambdas(double lambdaY, double lambdaCb, double lambdaCr)
+{
+    m_lambda2[0] = (int64_t)(lambdaY * 256. + 0.5);
+    m_lambda2[1] = (int64_t)(lambdaCb * 256. + 0.5);
+    m_lambda2[2] = (int64_t)(lambdaCr * 256. + 0.5);
+}
+
 /* To minimize the distortion only. No rate is considered */
-uint32_t Quant::signBitHidingHDQ(coeff_t* coeff, int32_t* deltaU, uint32_t numSig, const TUEntropyCodingParameters &codingParameters)
+uint32_t Quant::signBitHidingHDQ(coeff_t* coeff, int32_t* deltaU, uint32_t numSig, const TUEntropyCodingParameters &codeParams)
 {
-    const uint32_t log2TrSizeCG = codingParameters.log2TrSizeCG;
-    const uint16_t *scan = codingParameters.scan;
+    const uint32_t log2TrSizeCG = codeParams.log2TrSizeCG;
+    const uint16_t *scan = codeParams.scan;
     bool lastCG = true;
 
     for (int cg = (1 << log2TrSizeCG * 2) - 1; cg >= 0; cg--)
@@ -322,16 +327,8 @@ uint32_t Quant::signBitHidingHDQ(coeff_t
     return numSig;
 }
 
-uint32_t Quant::transformNxN(TComDataCU* cu,
-                             pixel*      fenc,
-                             uint32_t    fencStride,
-                             int16_t*    residual,
-                             uint32_t    stride,
-                             coeff_t*    coeff,
-                             uint32_t    log2TrSize,
-                             TextType    ttype,
-                             uint32_t    absPartIdx,
-                             bool        useTransformSkip)
+uint32_t Quant::transformNxN(TComDataCU* cu, pixel* fenc, uint32_t fencStride, int16_t* residual, uint32_t stride,
+                             coeff_t* coeff, uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx, bool useTransformSkip)
 {
     if (cu->getCUTransquantBypass(absPartIdx))
     {
@@ -407,16 +404,17 @@ uint32_t Quant::transformNxN(TComDataCU*
 
         if (numSig >= 2 && cu->m_slice->m_pps->bSignHideEnabled)
         {
-            TUEntropyCodingParameters codingParameters;
-            cu->getTUEntropyCodingParameters(codingParameters, absPartIdx, log2TrSize, isLuma);
-            return signBitHidingHDQ(coeff, deltaU, numSig, codingParameters);
+            TUEntropyCodingParameters codeParams;
+            cu->getTUEntropyCodingParameters(codeParams, absPartIdx, log2TrSize, isLuma);
+            return signBitHidingHDQ(coeff, deltaU, numSig, codeParams);
         }
         else
             return numSig;
     }
 }
 
-void Quant::invtransformNxN(bool transQuantBypass, int16_t* residual, uint32_t stride, coeff_t* coeff, uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig)
+void Quant::invtransformNxN(bool transQuantBypass, int16_t* residual, uint32_t stride, coeff_t* coeff,
+                            uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig)
 {
     if (transQuantBypass)
     {
@@ -511,58 +509,66 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
 
     x265_emms();
 
-    /* unquant constants for psy-rdoq. The dequant coefficients have a (1<<4) scale applied
-     * that must be removed during unquant.  This may be larger than the QP upshift, which
-     * would turn some shifts around. To avoid this we add an optional pre-up-shift of the
-     * quantized level. Note that in real dequant there is clipping at several stages. We
-     * skip the clipping when measuring RD cost. */
+    /* unquant constants for psy-rdoq. The dequant coefficients have a (1<<4) scale applied that
+     * must be removed during unquant.  This may be larger than the QP upshift, which would turn
+     * some shifts around. To avoid this we add an addition shift factor to the dequant coeff.  Note
+     * that in real dequant there is clipping at several stages. We skip the clipping when measuring
+     * RD cost */
+#define UNQUANT(lvl) (((lvl) * (unquantScale[blkPos] << unquantPer) + unquantRound) >> unquantShift)
     int32_t *unquantScale = m_scalingList->m_dequantCoef[log2TrSize - 2][scalingListType][rem];
     int unquantShift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
-    int unquantRound, unquantPreshift;
+    int unquantRound, unquantPer;
     unquantShift += 4;
     if (unquantShift > per)
     {
         unquantRound = 1 << (unquantShift - per - 1);
-        unquantPreshift = 0;
+        unquantPer = per;
     }
     else
     {
-        unquantPreshift = 4;
-        unquantShift += unquantPreshift;
+        unquantPer = per + 4;
+        unquantShift += 4;
         unquantRound = 0;
     }
+
+#define SIGCOST(bits)   ((lambda2 * (bits)) >> 8)
+#define RDCOST(d, bits) ((((int64_t)d * d) << scaleBits) + ((lambda2 * (bits)) >> 8))
+    int64_t lambda2 = m_lambda2[ttype];
     int scaleBits = SCALE_BITS - 2 * transformShift;
 
-    double lambda2 = m_lambdas[ttype];
-    bool bIsLuma = ttype == TEXT_LUMA;
-
-    double totalUncodedCost = 0;
-    double costCoeff[32 * 32];   /* d*d + lambda * bits */
-    double costUncoded[32 * 32]; /* d*d + lambda * 0    */
-    double costSig[32 * 32];     /* lambda * bits       */
+    int64_t costCoeff[32 * 32];   /* d*d + lambda * bits */
+    int64_t costUncoded[32 * 32]; /* d*d + lambda * 0    */
+    int64_t costSig[32 * 32];     /* lambda * bits       */
 
     int rateIncUp[32 * 32];      /* signal overhead of increasing level */
     int rateIncDown[32 * 32];    /* signal overhead of decreasing level */
     int sigRateDelta[32 * 32];   /* signal difference between zero and non-zero */
-    int deltaU[32 * 32];
 
-    double   costCoeffGroupSig[MLS_GRP_NUM]; /* lambda * bits of group coding cost */
+    int64_t costCoeffGroupSig[MLS_GRP_NUM]; /* lambda * bits of group coding cost */
     uint64_t sigCoeffGroupFlag64 = 0;
 
     uint32_t ctxSet      = 0;
     int    c1            = 1;
     int    c2            = 0;
-    double baseCost      = 0;
-    int    lastScanPos   = -1;
     uint32_t goRiceParam = 0;
     uint32_t c1Idx       = 0;
     uint32_t c2Idx       = 0;
     int cgLastScanPos    = -1;
+    int lastScanPos      = -1;
     const uint32_t cgSize = (1 << MLS_CG_SIZE); /* 4x4 num coef = 16 */
+    bool bIsLuma = ttype == TEXT_LUMA;
 
-    TUEntropyCodingParameters codingParameters;
-    cu->getTUEntropyCodingParameters(codingParameters, absPartIdx, log2TrSize, bIsLuma);
-    const uint32_t cgNum = 1 << codingParameters.log2TrSizeCG * 2;
+    /* total rate distortion cost of transform block, as CBF=0 */
+    int64_t totalUncodedCost = 0;
+
+    /* Total rate distortion cost of this transform block, counting te distortion of uncoded blocks,
+     * the distortion and signal cost of coded blocks, and the coding cost of significant
+     * coefficient and coefficient group bitmaps */
+    int64_t totalRdCost = 0;
+
+    TUEntropyCodingParameters codeParams;
+    cu->getTUEntropyCodingParameters(codeParams, absPartIdx, log2TrSize, bIsLuma);
+    const uint32_t cgNum = 1 << codeParams.log2TrSizeCG * 2;
 
     uint32_t scanPos;
     coeffGroupRDStats cgRdStats;
@@ -570,19 +576,19 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
     /* iterate over coding groups in reverse scan order */
     for (int cgScanPos = cgNum - 1; cgScanPos >= 0; cgScanPos--)
     {
-        const uint32_t cgBlkPos = codingParameters.scanCG[cgScanPos];
-        const uint32_t cgPosY   = cgBlkPos >> codingParameters.log2TrSizeCG;
-        const uint32_t cgPosX   = cgBlkPos - (cgPosY << codingParameters.log2TrSizeCG);
+        const uint32_t cgBlkPos = codeParams.scanCG[cgScanPos];
+        const uint32_t cgPosY   = cgBlkPos >> codeParams.log2TrSizeCG;
+        const uint32_t cgPosX   = cgBlkPos - (cgPosY << codeParams.log2TrSizeCG);
         const uint64_t cgBlkPosMask = ((uint64_t)1 << cgBlkPos);
         memset(&cgRdStats, 0, sizeof(coeffGroupRDStats));
 
-        const int patternSigCtx = calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
+        const int patternSigCtx = calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codeParams.log2TrSizeCG);
 
         /* iterate over coefficients in each group in reverse scan order */
         for (int scanPosinCG = cgSize - 1; scanPosinCG >= 0; scanPosinCG--)
         {
             scanPos              = (cgScanPos << MLS_CG_SIZE) + scanPosinCG;
-            uint32_t blkPos      = codingParameters.scan[scanPos];
+            uint32_t blkPos      = codeParams.scan[scanPos];
             uint32_t maxAbsLevel = abs(dstCoeff[blkPos]);             /* abs(quantized coeff) */
             int signCoef         = m_resiDctCoeff[blkPos];            /* pre-quantization DCT coeff */
             int predictedCoef    = m_fencDctCoeff[blkPos] - signCoef; /* predicted DCT = source DCT - residual DCT*/
@@ -592,10 +598,10 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
              * FIX15 nature of the CABAC cost tables minus the forward transform scale */
 
             /* cost of not coding this coefficient (all distortion, no signal bits) */
-            costUncoded[scanPos] = (double)((uint64_t)(signCoef * signCoef) << scaleBits);
+            costUncoded[scanPos] = (int64_t)(signCoef * signCoef) << scaleBits;
             if (usePsy && blkPos)
-                /* when no coefficient is coded, predicted coef == recon coef */
-                costUncoded[scanPos] -= (int)(((m_psyRdoqScale * predictedCoef) << scaleBits) >> 8);
+                /* when no residual coefficient is coded, predicted coef == recon coef */
+                costUncoded[scanPos] -= (((m_psyRdoqScale * predictedCoef) << scaleBits) >> 8);
 
             totalUncodedCost += costUncoded[scanPos];
 
@@ -609,14 +615,14 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
 
             if (lastScanPos < 0)
             {
+                /* coefficients after lastNZ have no distortion signal cost */
+                costCoeff[scanPos] = 0;
+                costSig[scanPos] = 0;
+
                 /* No non-zero coefficient yet found, but this does not mean
                  * there is no uncoded-cost for this coefficient. Pre-
                  * quantization the coefficient may have been non-zero */
-                costCoeff[scanPos] = 0;
-                baseCost += costUncoded[scanPos];
-
-                /* coefficients after lastNZ have no signal cost */
-                costSig[scanPos] = 0;
+                totalRdCost += costUncoded[scanPos];
             }
             else
             {
@@ -634,36 +640,35 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                 const int *levelAbsBits = m_estBitsSbac.levelAbsBits[absCtx];
 
                 uint32_t level = 0;
-                uint32_t codedSigBits = 0;
-                costCoeff[scanPos] = MAX_DOUBLE;
+                uint32_t sigCoefBits = 0;
+                costCoeff[scanPos] = MAX_INT64;
 
                 if ((int)scanPos == lastScanPos)
                     sigRateDelta[blkPos] = 0;
                 else
                 {
-                    const uint32_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext);
+                    const uint32_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codeParams.firstSignificanceMapContext);
                     if (maxAbsLevel < 3)
                     {
                         /* set default costs to uncoded costs */
-                        costSig[scanPos] = lambda2 * m_estBitsSbac.significantBits[ctxSig][0];
+                        costSig[scanPos] = SIGCOST(m_estBitsSbac.significantBits[ctxSig][0]);
                         costCoeff[scanPos] = costUncoded[scanPos] + costSig[scanPos];
                     }
                     sigRateDelta[blkPos] = m_estBitsSbac.significantBits[ctxSig][1] - m_estBitsSbac.significantBits[ctxSig][0];
-                    codedSigBits = m_estBitsSbac.significantBits[ctxSig][1];
+                    sigCoefBits = m_estBitsSbac.significantBits[ctxSig][1];
                 }
                 if (maxAbsLevel)
                 {
                     uint32_t minAbsLevel = X265_MAX(maxAbsLevel - 1, 1);
                     for (uint32_t lvl = maxAbsLevel; lvl >= minAbsLevel; lvl--)
                     {
-                        uint32_t rateCost = getICRateCost(lvl, lvl - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx);
+                        uint32_t levelBits = getICRateCost(lvl, lvl - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx) + IEP_RATE;
 
-                        int unquantAbsLevel = ((lvl << unquantPreshift) * (unquantScale[blkPos] << per) + unquantRound) >> unquantShift;
-                        int d = unquantAbsLevel - abs(signCoef);
-                        uint64_t distortion = ((uint64_t)(d * d)) << scaleBits;
-                        double curCost = distortion + lambda2 * (codedSigBits + rateCost + IEP_RATE);
+                        int unquantAbsLevel = UNQUANT(lvl);