[x265-commits] [x265] quant: remove TODO comment

Steve Borho steve at borho.org
Tue Aug 5 09:12:14 CEST 2014


details:   http://hg.videolan.org/x265/rev/ae7c5f4a842d
branches:  
changeset: 7717:ae7c5f4a842d
user:      Steve Borho <steve at borho.org>
date:      Mon Aug 04 14:18:30 2014 -0500
description:
quant: remove TODO comment

Yes, there is a reason to check maxAbsLevel < 3 here, diffLevel below can only
be 0, 1, or 2.
Subject: [x265] psy-rdoq: fix unquant shift factors

details:   http://hg.videolan.org/x265/rev/08304a298065
branches:  
changeset: 7718:08304a298065
user:      Steve Borho <steve at borho.org>
date:      Mon Aug 04 16:22:46 2014 -0500
description:
psy-rdoq: fix unquant shift factors

dequant coefficients are made with s_invQuantScales[rem] << 4, so to perform an
unquant we must remove those four bits from dequantCoeff
Subject: [x265] quant: change how RDOQ measures distortion [CHANGES OUTPUTS]

details:   http://hg.videolan.org/x265/rev/da57b1e8ac58
branches:  
changeset: 7719:da57b1e8ac58
user:      Steve Borho <steve at borho.org>
date:      Mon Aug 04 20:07:31 2014 -0500
description:
quant: change how RDOQ measures distortion [CHANGES OUTPUTS]

RDOQ, as it was written in the HM, expects scaled level values to be output by
quant; these are the output levels multiplied by the quantizing coeffificient
but without the rounding factor and without the downshift. It would then
measure distortion as the difference between this scaled level and level <<
qbits (a rough unquant). To make this math work, it was pre-calculating an
error scale factor (per block position, since the quantization coefficients can
vary) which divided the result by the squared scale factor and upshifting to
simultaneously account for the FIX15 nature of the signaling costs and the
uniform scaling of the forward transform. To roughly summarize:

   errScale = (1 << (15 - 2 * transformShift)) / (quantCoeff[i] * quantCoeff[i])
   levelScaled = level * quantCoeff[i]
   distortion = levelScaled - (level << qbits);
   cost = distortion * distortion * errScale + lambda2 * bitsFix15

It was forced to use floating point math for the errScale and distortion
calculations, and thus did not bother with fixed point math for lambda2.

This commit changes the distortion measurement to be the difference between the
original (pre-quantization) DCT coefficient and the unquantized level.

   unquantAbsLevel = (level * quantCoeff[i] + pad) >> shift;
   distortion = unquantAbsLevel - abs(signCoef);
   distScale = 1 << (15 - 2 * transformShift);
   cost = distortion * distortion << distScale + lambda2 * bitsFix15

Note that the same scale factor is still required to account for the FIX15 bit
cost and the forward DCT scale but now it is a simple shift operation.

This commit does not change the data types; that will be a later commit once the
dynamic ranges have been properly evaluated.  And deltaU[], used by sign hiding,
is left using the scaled level cost basis for now.
Subject: [x265] asm: cvt16to32_cnt[16x16] for TSkip

details:   http://hg.videolan.org/x265/rev/1760c267c1e9
branches:  
changeset: 7720:1760c267c1e9
user:      Min Chen <chenm003 at 163.com>
date:      Mon Aug 04 19:26:36 2014 -0700
description:
asm: cvt16to32_cnt[16x16] for TSkip
Subject: [x265] asm: asm header updates

details:   http://hg.videolan.org/x265/rev/22b1b01b95aa
branches:  
changeset: 7721:22b1b01b95aa
user:      Steve Borho <steve at borho.org>
date:      Mon Aug 04 23:09:42 2014 -0500
description:
asm: asm header updates
Subject: [x265] me: clip motion search area to signaled motion vector length limits

details:   http://hg.videolan.org/x265/rev/0d4723a0080c
branches:  
changeset: 7722:0d4723a0080c
user:      Steve Borho <steve at borho.org>
date:      Tue Aug 05 01:05:47 2014 -0500
description:
me: clip motion search area to signaled motion vector length limits

diffstat:

 source/Lib/TLibEncoder/TEncSearch.cpp |    8 ++
 source/common/cpu.cpp                 |    2 +-
 source/common/quant.cpp               |   64 +++++++++--------
 source/common/scalinglist.cpp         |   14 +---
 source/common/scalinglist.h           |    1 -
 source/common/x86/asm-primitives.cpp  |    2 +
 source/common/x86/blockcopy8.asm      |  124 +++++++++++++++++++++++++++++++++-
 source/common/x86/const-a.asm         |    3 +-
 source/common/x86/cpu-a.asm           |    2 +-
 source/common/x86/mc-a.asm            |    2 +-
 source/common/x86/mc-a2.asm           |    2 +-
 source/common/x86/pixel-a.asm         |    2 +-
 source/common/x86/pixel.h             |    2 +-
 source/common/x86/sad-a.asm           |    2 +-
 source/common/x86/ssd-a.asm           |    2 +-
 source/common/x86/x86inc.asm          |    2 +-
 16 files changed, 179 insertions(+), 55 deletions(-)

diffs (truncated from 500 to 300 lines):

diff -r c5f2a20e6f4c -r 0d4723a0080c source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp	Tue Aug 05 01:05:47 2014 -0500
@@ -2272,6 +2272,14 @@ void TEncSearch::xSetSearchRange(TComDat
     cu->clipMv(mvmin);
     cu->clipMv(mvmax);
 
+    /* Clip search range to signaled maximum MV length.
+     * We do not support this VUI field being changed from the default */
+    const int maxMvLen = (1 << 15) - 1;
+    mvmin.x = X265_MAX(mvmin.x, -maxMvLen);
+    mvmin.y = X265_MAX(mvmin.y, -maxMvLen);
+    mvmax.x = X265_MIN(mvmax.x, maxMvLen);
+    mvmax.y = X265_MIN(mvmax.y, maxMvLen);
+
     mvmin >>= 2;
     mvmax >>= 2;
 
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/cpu.cpp
--- a/source/common/cpu.cpp	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/cpu.cpp	Tue Aug 05 01:05:47 2014 -0500
@@ -3,7 +3,7 @@
  *
  * Authors: Loren Merritt <lorenm at u.washington.edu>
  *          Laurent Aimar <fenrir at via.ecp.fr>
- *          Jason Garrett-Glaser <darkshikari at gmail.com>
+ *          Fiona Glaser <fiona at x264.com>
  *          Steve Borho <steve at borho.org>
  *
  * This program is free software; you can redistribute it and/or modify
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/quant.cpp
--- a/source/common/quant.cpp	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/quant.cpp	Tue Aug 05 01:05:47 2014 -0500
@@ -509,14 +509,29 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
 
     x265_emms();
 
-    /* unquant constants for psy-rdoq */
+    /* unquant constants for psy-rdoq. The dequant coefficients have a (1<<4) scale applied
+     * that must be removed during unquant.  This may be larger than the QP upshift, which
+     * would turn some shifts around. To avoid this we add an optional pre-up-shift of the
+     * quantized level. Note that in real dequant there is clipping at several stages. We
+     * skip the clipping when measuring RD cost. */
     int32_t *unquantScale = m_scalingList->m_dequantCoef[log2TrSize - 2][scalingListType][rem];
     int unquantShift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
-    int unquantRound = 1 << (unquantShift - 1);
+    int unquantRound, unquantPreshift;
+    unquantShift += 4;
+    if (unquantShift > per)
+    {
+        unquantRound = 1 << (unquantShift - per - 1);
+        unquantPreshift = 0;
+    }
+    else
+    {
+        unquantPreshift = 4;
+        unquantShift += unquantPreshift;
+        unquantRound = 0;
+    }
     int scaleBits = SCALE_BITS - 2 * transformShift;
 
     double lambda2 = m_lambdas[ttype];
-    double *errScale = m_scalingList->m_errScale[log2TrSize - 2][scalingListType][rem];
     bool bIsLuma = ttype == TEXT_LUMA;
 
     double totalUncodedCost = 0;
@@ -566,18 +581,16 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
         {
             scanPos              = (cgScanPos << MLS_CG_SIZE) + scanPosinCG;
             uint32_t blkPos      = codingParameters.scan[scanPos];
-            double scaleFactor   = errScale[blkPos];       /* (1 << scaleBits) / (quantCoef * quantCoef) */
-            int levelScaled      = scaledCoeff[blkPos];    /* abs(coef) * quantCoef */
-            uint32_t maxAbsLevel = abs(dstCoeff[blkPos]);  /* abs(coef) */
-            int signCoef         = m_resiDctCoeff[blkPos];
-            int predictedCoef    = m_fencDctCoeff[blkPos] - signCoef;
+            uint32_t maxAbsLevel = abs(dstCoeff[blkPos]);             /* abs(quantized coeff) */
+            int signCoef         = m_resiDctCoeff[blkPos];            /* pre-quantization DCT coeff */
+            int predictedCoef    = m_fencDctCoeff[blkPos] - signCoef; /* predicted DCT = source DCT - residual DCT*/
 
-            /* RDOQ measures distortion as the scaled level squared times a
-             * scale factor which tries to remove the quantCoef back out, but
-             * adds scaleBits to account for IEP_RATE which is 32k (1 << SCALE_BITS) */
+            /* RDOQ measures distortion as the squared difference between the unquantized coded level
+             * and the original DCT coefficient. The result is shifted scaleBits to account for the
+             * FIX15 nature of the CABAC cost tables minus the forward transform scale */
 
-            /* cost of not coding this coefficient (no signal bits) */
-            costUncoded[scanPos] = ((uint64_t)levelScaled * levelScaled) * scaleFactor;
+            /* cost of not coding this coefficient (all distortion, no signal bits) */
+            costUncoded[scanPos] = (double)((uint64_t)(signCoef * signCoef) << scaleBits);
             if (usePsy && blkPos)
                 /* when no coefficient is coded, predicted coef == recon coef */
                 costUncoded[scanPos] -= (int)(((m_psyRdoqScale * predictedCoef) << scaleBits) >> 8);
@@ -600,7 +613,7 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                 costCoeff[scanPos] = 0;
                 baseCost += costUncoded[scanPos];
 
-                /* coeff in unsignaled coeff groups have no signal cost */
+                /* coefficients after lastNZ have no signal cost */
                 costSig[scanPos] = 0;
             }
             else
@@ -629,8 +642,7 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                     const uint32_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext);
                     if (maxAbsLevel < 3)
                     {
-                        /* set default costs to uncoded costs.
-                         * TODO: is there really a need to check maxAbsLevel < 3 here? */
+                        /* set default costs to uncoded costs */
                         costSig[scanPos] = lambda2 * m_estBitsSbac.significantBits[ctxSig][0];
                         costCoeff[scanPos] = costUncoded[scanPos] + costSig[scanPos];
                     }
@@ -639,19 +651,19 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                 }
                 if (maxAbsLevel)
                 {
-                    const int64_t err1 = levelScaled - ((int64_t)maxAbsLevel << qbits);
-                    double err2 = (double)(err1 * err1);
-
                     uint32_t minAbsLevel = X265_MAX(maxAbsLevel - 1, 1);
                     for (uint32_t lvl = maxAbsLevel; lvl >= minAbsLevel; lvl--)
                     {
                         uint32_t rateCost = getICRateCost(lvl, lvl - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx);
-                        double curCost = err2 * scaleFactor + lambda2 * (codedSigBits + rateCost + IEP_RATE);
+
+                        int unquantAbsLevel = ((lvl << unquantPreshift) * (unquantScale[blkPos] << per) + unquantRound) >> unquantShift;
+                        int d = unquantAbsLevel - abs(signCoef);
+                        uint64_t distortion = ((uint64_t)(d * d)) << scaleBits;
+                        double curCost = distortion + lambda2 * (codedSigBits + rateCost + IEP_RATE);
 
                         // Psy RDOQ: bias in favor of higher AC coefficients in the reconstructed frame
                         if (usePsy && blkPos)
                         {
-                            int unquantAbsLevel = (lvl * (unquantScale[blkPos] << per) + unquantRound) >> unquantShift;
                             int reconCoef = abs(unquantAbsLevel + SIGN(predictedCoef, signCoef));
                             curCost -= (int)(((m_psyRdoqScale * reconCoef) << scaleBits) >> 8);
                         }
@@ -662,18 +674,10 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                             costCoeff[scanPos] = curCost;
                             costSig[scanPos] = lambda2 * codedSigBits;
                         }
-
-                        if (lvl > minAbsLevel)
-                        {
-                            // add deltas to get squared distortion at minAbsLevel
-                            int64_t err3 = (int64_t)2 * err1 * ((int64_t)1 << qbits);
-                            int64_t err4 = ((int64_t)1 << qbits) * ((int64_t)1 << qbits);
-                            err2 += err3 + err4;
-                        }
                     }
                 }
 
-                deltaU[blkPos] = (levelScaled - ((int)level << qbits)) >> (qbits - 8);
+                deltaU[blkPos] = (scaledCoeff[blkPos] - ((int)level << qbits)) >> (qbits - 8);
                 dstCoeff[blkPos] = level;
                 baseCost += costCoeff[scanPos];
 
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/scalinglist.cpp
--- a/source/common/scalinglist.cpp	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/scalinglist.cpp	Tue Aug 05 01:05:47 2014 -0500
@@ -128,7 +128,6 @@ ScalingList::ScalingList()
 {
     memset(m_quantCoef, 0, sizeof(m_quantCoef));
     memset(m_dequantCoef, 0, sizeof(m_dequantCoef));
-    memset(m_errScale, 0, sizeof(m_errScale));
     memset(m_scalingListCoef, 0, sizeof(m_scalingListCoef));
 }
 
@@ -145,8 +144,7 @@ bool ScalingList::init()
             {
                 m_quantCoef[sizeId][listId][rem] = X265_MALLOC(int32_t, s_numCoefPerSize[sizeId]);
                 m_dequantCoef[sizeId][listId][rem] = X265_MALLOC(int32_t, s_numCoefPerSize[sizeId]);
-                m_errScale[sizeId][listId][rem] = X265_MALLOC(double, s_numCoefPerSize[sizeId]);
-                ok &= m_quantCoef[sizeId][listId][rem] && m_dequantCoef[sizeId][listId][rem] && m_errScale[sizeId][listId][rem];
+                ok &= m_quantCoef[sizeId][listId][rem] && m_dequantCoef[sizeId][listId][rem];
             }
         }
     }
@@ -164,7 +162,6 @@ ScalingList::~ScalingList()
             {
                 X265_FREE(m_quantCoef[sizeId][listId][rem]);
                 X265_FREE(m_dequantCoef[sizeId][listId][rem]);
-                X265_FREE(m_errScale[sizeId][listId][rem]);
             }
         }
     }
@@ -331,11 +328,6 @@ void ScalingList::setupQuantMatrices()
         int stride = X265_MIN(MAX_MATRIX_SIZE_NUM, width);
         int count = s_numCoefPerSize[size];
 
-        // Error scale constants
-        int log2TrSize = size + 2;
-        int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize; // Represents scaling through forward transform
-        int scalingBits = 1 << (SCALE_BITS - 2 * transformShift);            // Compensate for scaling of bitcount in Lagrange cost function
-
         for (int list = 0; list < s_numListsAtSize[size]; list++)
         {
             int32_t *coeff = m_scalingListCoef[size][list];
@@ -345,7 +337,6 @@ void ScalingList::setupQuantMatrices()
             {
                 int32_t *quantCoeff   = m_quantCoef[size][list][rem];
                 int32_t *dequantCoeff = m_dequantCoef[size][list][rem];
-                double *errScale      = m_errScale[size][list][rem];
 
                 if (m_bEnabled)
                 {
@@ -361,9 +352,6 @@ void ScalingList::setupQuantMatrices()
                         dequantCoeff[i] = s_invQuantScales[rem] << 4;
                     }
                 }
-
-                for (int i = 0; i < count; i++)
-                    errScale[i] = (double)scalingBits / (quantCoeff[i] * quantCoeff[i]);
             }
         }
     }
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/scalinglist.h
--- a/source/common/scalinglist.h	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/scalinglist.h	Tue Aug 05 01:05:47 2014 -0500
@@ -49,7 +49,6 @@ public:
 
     int32_t* m_quantCoef[NUM_SIZES][NUM_LISTS][NUM_REM];   // array of quantization matrix coefficient 4x4
     int32_t* m_dequantCoef[NUM_SIZES][NUM_LISTS][NUM_REM]; // array of dequantization matrix coefficient 4x4
-    double*  m_errScale[NUM_SIZES][NUM_LISTS][NUM_REM];
 
     bool     m_bEnabled;
     bool     m_bDataPresent; // non-default scaling lists must be signaled
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/x86/asm-primitives.cpp	Tue Aug 05 01:05:47 2014 -0500
@@ -1232,6 +1232,7 @@ void Setup_Assembly_Primitives(EncoderPr
         // TODO: check POPCNT flag!
         p.cvt16to32_cnt[BLOCK_4x4] = x265_cvt16to32_cnt_4_sse4;
         p.cvt16to32_cnt[BLOCK_8x8] = x265_cvt16to32_cnt_8_sse4;
+        p.cvt16to32_cnt[BLOCK_16x16] = x265_cvt16to32_cnt_16_sse4;
 
         HEVC_SATD(sse4);
         SA8D_INTER_FROM_BLOCK(sse4);
@@ -1333,6 +1334,7 @@ void Setup_Assembly_Primitives(EncoderPr
         p.ssd_s[BLOCK_32x32] = x265_pixel_ssd_s_32_avx2;
         p.cvt16to32_cnt[BLOCK_4x4] = x265_cvt16to32_cnt_4_avx2;
         p.cvt16to32_cnt[BLOCK_8x8] = x265_cvt16to32_cnt_8_avx2;
+        p.cvt16to32_cnt[BLOCK_16x16] = x265_cvt16to32_cnt_16_avx2;
     }
 #endif // if HIGH_BIT_DEPTH
 }
diff -r c5f2a20e6f4c -r 0d4723a0080c source/common/x86/blockcopy8.asm
--- a/source/common/x86/blockcopy8.asm	Fri Aug 01 18:47:42 2014 +0530
+++ b/source/common/x86/blockcopy8.asm	Tue Aug 05 01:05:47 2014 -0500
@@ -31,6 +31,7 @@ tab_Vm:    db 0, 2, 4, 6, 8, 10, 12, 14,
 
 cextern pw_4
 cextern pb_8
+cextern pb_32
 
 SECTION .text
 
@@ -3329,4 +3330,125 @@ cglobal cvt16to32_cnt_8, 3,5,6
     add         r0d, tmpd
 %endif
     RET
-;IACA_END
+
+
+;--------------------------------------------------------------------------------------
+; uint32_t cvt16to32_cnt(int32_t *dst, int16_t *src, intptr_t stride);
+;--------------------------------------------------------------------------------------
+INIT_XMM sse4
+cglobal cvt16to32_cnt_16, 3,4,7
+    add         r2d, r2d
+    mov         r3d, 16/2
+    pxor        m5, m5
+    pxor        m6, m6
+
+.loop
+    ; row 0
+    movu        m0, [r1]
+    movu        m1, [r1 + mmsize]
+    packsswb    m4, m0, m1
+    pcmpeqb     m4, m6
+    paddb       m5, m4
+    pmovsxwd    m2, m0
+    pmovsxwd    m0, [r1 + 8]
+    pmovsxwd    m3, m1
+    pmovsxwd    m1, [r1 + mmsize + 8]
+    movu        [r0 + 0 * mmsize], m2
+    movu        [r0 + 1 * mmsize], m0
+    movu        [r0 + 2 * mmsize], m3
+    movu        [r0 + 3 * mmsize], m1
+
+    ; row 1
+    movu        m0, [r1 + r2]
+    movu        m1, [r1 + r2 + mmsize]
+    packsswb    m4, m0, m1
+    pcmpeqb     m4, m6
+    paddb       m5, m4
+    pmovsxwd    m2, m0
+    pmovsxwd    m0, [r1 + r2 + 8]
+    pmovsxwd    m3, m1
+    pmovsxwd    m1, [r1 + r2 + mmsize + 8]


More information about the x265-commits mailing list