[x265-commits] [x265] predict: whitespace nits

Sun Aug 3 19:13:23 CEST 2014

details:   http://hg.videolan.org/x265/rev/3db5fda6abf0
branches:  
changeset: 7665:3db5fda6abf0
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Aug 01 16:31:20 2014 +0530
description:
predict: whitespace nits
Subject: [x265] cleanup:  move m_predYuv and m_predTempYuv from predict to TEncSearch

details:   http://hg.videolan.org/x265/rev/a74b24444ae8
branches:  
changeset: 7666:a74b24444ae8
user:      Santhoshini Sekar <santhoshini at multicorewareinc.com>
date:      Fri Aug 01 15:04:36 2014 +0530
description:
cleanup:  move m_predYuv and m_predTempYuv from predict to TEncSearch
Subject: [x265] rc: enable abr reset in the first pass of two pass encode.

details:   http://hg.videolan.org/x265/rev/a9a7f0933ecc
branches:  
changeset: 7667:a9a7f0933ecc
user:      Aarthi Thirumalai
date:      Fri Aug 01 18:45:57 2014 +0530
description:
rc: enable abr reset in the first pass of two pass encode.

observe this improves second pass results in ultrafast presets for some videos.
Subject: [x265] dpb: remove redundant call to getNalUnitType(), output will not change

details:   http://hg.videolan.org/x265/rev/fb24f965eade
branches:  
changeset: 7668:fb24f965eade
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 12:12:43 2014 -0500
description:
dpb: remove redundant call to getNalUnitType(), output will not change
Subject: [x265] dpb: getNalUnitType() cannot return NAL_UNIT_CODED_SLICE_IDR_N_LP

details:   http://hg.videolan.org/x265/rev/b911b02737c8
branches:  
changeset: 7669:b911b02737c8
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 12:13:25 2014 -0500
description:
dpb: getNalUnitType() cannot return NAL_UNIT_CODED_SLICE_IDR_N_LP
Subject: [x265] dpb: style nits

details:   http://hg.videolan.org/x265/rev/5d1bd6097113
branches:  
changeset: 7670:5d1bd6097113
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 12:17:11 2014 -0500
description:
dpb: style nits
Subject: [x265] dpb: remove checks for slice types we do not emit

details:   http://hg.videolan.org/x265/rev/963b8e7b1dff
branches:  
changeset: 7671:963b8e7b1dff
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 12:26:34 2014 -0500
description:
dpb: remove checks for slice types we do not emit
Subject: [x265] dpb: cleanup decodingRefreshMarking()

details:   http://hg.videolan.org/x265/rev/6b1753638790
branches:  
changeset: 7672:6b1753638790
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 12:28:59 2014 -0500
description:
dpb: cleanup decodingRefreshMarking()
Subject: [x265] quant: apply scale factor in just one place

details:   http://hg.videolan.org/x265/rev/2a7315a37d67
branches:  
changeset: 7673:2a7315a37d67
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 13:13:15 2014 -0500
description:
quant: apply scale factor in just one place
Subject: [x265] quant: delay err3, err4 calculation until/if necessary

details:   http://hg.videolan.org/x265/rev/244ba5fa80d4
branches:  
changeset: 7674:244ba5fa80d4
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 13:15:30 2014 -0500
description:
quant: delay err3, err4 calculation until/if necessary
Subject: [x265] quant: hoist some calculations out of the loop

details:   http://hg.videolan.org/x265/rev/32b4aa0eb4fb
branches:  
changeset: 7675:32b4aa0eb4fb
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 13:19:18 2014 -0500
description:
quant: hoist some calculations out of the loop
Subject: [x265] quant: simplify minAbsLevel

details:   http://hg.videolan.org/x265/rev/db62272d284c
branches:  
changeset: 7676:db62272d284c
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 13:28:34 2014 -0500
description:
quant: simplify minAbsLevel
Subject: [x265] quant: convert getCodedLevel() into a macro, remove m_transformShift hack

details:   http://hg.videolan.org/x265/rev/ae8c153ee91d
branches:  
changeset: 7677:ae8c153ee91d
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 14:22:16 2014 -0500
description:
quant: convert getCodedLevel() into a macro, remove m_transformShift hack
Subject: [x265] quant: m_lambda2 no longer needs to be a member variable

details:   http://hg.videolan.org/x265/rev/287d37822825
branches:  
changeset: 7678:287d37822825
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 15:36:34 2014 -0500
description:
quant: m_lambda2 no longer needs to be a member variable

it is only used in rdoQuant() and can be declared on the stack
Subject: [x265] quant: make IEP_RATE an anonymous enum, it doesn't need storage

details:   http://hg.videolan.org/x265/rev/be69e059808a
branches:  
changeset: 7679:be69e059808a
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 15:37:06 2014 -0500
description:
quant: make IEP_RATE an anonymous enum, it doesn't need storage
Subject: [x265] quant: support scaling lists in psy-rdoq

details:   http://hg.videolan.org/x265/rev/8767ddb686af
branches:  
changeset: 7680:8767ddb686af
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 15:44:35 2014 -0500
description:
quant: support scaling lists in psy-rdoq
Subject: [x265] quant: rename costCoeff0 to costUncoded, add docs

details:   http://hg.videolan.org/x265/rev/1c9a6a976e5d
branches:  
changeset: 7681:1c9a6a976e5d
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 17:00:43 2014 -0500
description:
quant: rename costCoeff0 to costUncoded, add docs
Subject: [x265] quant: clarify last-nz optimization loop

details:   http://hg.videolan.org/x265/rev/11a3a69d3e29
branches:  
changeset: 7682:11a3a69d3e29
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 17:37:44 2014 -0500
description:
quant: clarify last-nz optimization loop
Subject: [x265] quant: correct rounding factor for unquant

details:   http://hg.videolan.org/x265/rev/253ad3eafaa2
branches:  
changeset: 7683:253ad3eafaa2
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 18:07:21 2014 -0500
description:
quant: correct rounding factor for unquant
Subject: [x265] quant: blockUncodedCost -> totalUncodedCost, improve comments

details:   http://hg.videolan.org/x265/rev/3b8853b12d9c
branches:  
changeset: 7684:3b8853b12d9c
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 18:08:04 2014 -0500
description:
quant: blockUncodedCost -> totalUncodedCost, improve comments
Subject: [x265] quant: remove redundant level intialization

details:   http://hg.videolan.org/x265/rev/d341acd13af2
branches:  
changeset: 7685:d341acd13af2
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 18:09:07 2014 -0500
description:
quant: remove redundant level intialization
Subject: [x265] quant: improve comments for trailing zero coeff

details:   http://hg.videolan.org/x265/rev/f14d233107d4
branches:  
changeset: 7686:f14d233107d4
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 18:13:40 2014 -0500
description:
quant: improve comments for trailing zero coeff
Subject: [x265] quant: more readability nits - no output changes

details:   http://hg.videolan.org/x265/rev/ed49f875ab20
branches:  
changeset: 7687:ed49f875ab20
user:      Steve Borho <steve at borho.org>
date:      Fri Aug 01 18:28:08 2014 -0500
description:
quant: more readability nits - no output changes
Subject: [x265] quant: re-order rdoq logic so only one RDO_CODED_LEVEL() call is required

details:   http://hg.videolan.org/x265/rev/30f1f1d739db
branches:  
changeset: 7688:30f1f1d739db
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 08:58:46 2014 -0500
description:
quant: re-order rdoq logic so only one RDO_CODED_LEVEL() call is required
Subject: [x265] quant: RDO_CODED_LEVEL macro can now be inlined for easier debugging

details:   http://hg.videolan.org/x265/rev/9bb93a267300
branches:  
changeset: 7689:9bb93a267300
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 09:09:14 2014 -0500
description:
quant: RDO_CODED_LEVEL macro can now be inlined for easier debugging
Subject: [x265] quant: rename sigCost to codedSigBits, comment nit

details:   http://hg.videolan.org/x265/rev/a28d5ae1b52a
branches:  
changeset: 7690:a28d5ae1b52a
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 09:09:22 2014 -0500
description:
quant: rename sigCost to codedSigBits, comment nit
Subject: [x265] quant: levelDouble -> levelScaled

details:   http://hg.videolan.org/x265/rev/28c35f8e4f43
branches:  
changeset: 7691:28c35f8e4f43
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 09:17:50 2014 -0500
description:
quant: levelDouble -> levelScaled

This always confused the heck out of me. The level was not doubled, it was not
a double, and it wasn't squared. It was just the level scaled by the quant
scale factor
Subject: [x265] quant: consistent comment style, improve comments

details:   http://hg.videolan.org/x265/rev/b12ac8919761
branches:  
changeset: 7692:b12ac8919761
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 10:14:59 2014 -0500
description:
quant: consistent comment style, improve comments
Subject: [x265] quant: change lastCG into a bool, use isOne flag to avoid abs() calls

details:   http://hg.videolan.org/x265/rev/69beab744475
branches:  
changeset: 7693:69beab744475
user:      Steve Borho <steve at borho.org>
date:      Sat Aug 02 10:41:30 2014 -0500
description:
quant: change lastCG into a bool, use isOne flag to avoid abs() calls
Subject: [x265] update header and support Intel IACA marker

details:   http://hg.videolan.org/x265/rev/e6184896aa7b
branches:  
changeset: 7694:e6184896aa7b
user:      Min Chen <chenm003 at 163.com>
date:      Fri Aug 01 17:56:11 2014 -0700
description:
update header and support Intel IACA marker
Subject: [x265] asm: cvt16to32_cnt[4x4] for TSkip

details:   http://hg.videolan.org/x265/rev/6f502ab94357
branches:  
changeset: 7695:6f502ab94357
user:      Min Chen <chenm003 at 163.com>
date:      Fri Aug 01 17:56:27 2014 -0700
description:
asm: cvt16to32_cnt[4x4] for TSkip
Subject: [x265] asm: cvt16to32_cnt[8x8] for TSkip

details:   http://hg.videolan.org/x265/rev/49bab9bdf2a3
branches:  
changeset: 7696:49bab9bdf2a3
user:      Min Chen <chenm003 at 163.com>
date:      Fri Aug 01 17:56:37 2014 -0700
description:
asm: cvt16to32_cnt[8x8] for TSkip

diffstat:

 source/Lib/TLibEncoder/TEncSearch.cpp |   13 +-
 source/Lib/TLibEncoder/TEncSearch.h   |    1 +
 source/common/dct.cpp                 |   21 +
 source/common/primitives.h            |    2 +
 source/common/quant.cpp               |  368 +++++++++++++++------------------
 source/common/quant.h                 |   13 +-
 source/common/slice.h                 |    7 +-
 source/common/x86/asm-primitives.cpp  |    6 +
 source/common/x86/blockcopy8.asm      |  232 +++++++++++++++++++++-
 source/common/x86/blockcopy8.h        |    8 +
 source/common/x86/const-a.asm         |    1 +
 source/common/x86/x86inc.asm          |   12 +
 source/encoder/dpb.cpp                |   54 +---
 source/encoder/predict.cpp            |    6 -
 source/encoder/predict.h              |    8 +-
 source/encoder/ratecontrol.cpp        |    4 +-
 source/test/pixelharness.cpp          |   44 ++++
 source/test/pixelharness.h            |    1 +
 18 files changed, 527 insertions(+), 274 deletions(-)

diffs (truncated from 1450 to 300 lines):

diff -r e85b0aaa64e4 -r 49bab9bdf2a3 source/Lib/TLibEncoder/TEncSearch.cpp

--- a/source/Lib/TLibEncoder/TEncSearch.cpp	Thu Jul 31 11:08:02 2014 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp	Fri Aug 01 17:56:37 2014 -0700
@@ -77,6 +77,7 @@ TEncSearch::~TEncSearch()
     X265_FREE(m_qtTempTrIdx);
     X265_FREE(m_qtTempCbf[0]);
     X265_FREE(m_qtTempTransformSkipFlag[0]);
+    m_predTempYuv.destroy();
 
     delete[] m_qtTempShortYuv;
 }
@@ -92,6 +93,7 @@ bool TEncSearch::initSearch(Encoder& top
     m_numLayers = top.m_quadtreeTULog2MaxSize - 2 + 1;
 
     initTempBuff(m_param->internalCsp);
+    ok &= m_predTempYuv.create(MAX_CU_SIZE, MAX_CU_SIZE, m_param->internalCsp);
     m_me.setSearchMethod(m_param->searchMethod);
     m_me.setSubpelRefine(m_param->subpelRefine);
 
@@ -107,7 +109,7 @@ bool TEncSearch::initSearch(Encoder& top
         m_qtTempCoeff[0][i] = X265_MALLOC(coeff_t, sizeL + sizeC * 2);
         m_qtTempCoeff[1][i] = m_qtTempCoeff[0][i] + sizeL;
         m_qtTempCoeff[2][i] = m_qtTempCoeff[0][i] + sizeL + sizeC;
-        m_qtTempShortYuv[i].create(MAX_CU_SIZE, MAX_CU_SIZE, m_param->internalCsp);
+        ok &= m_qtTempShortYuv[i].create(MAX_CU_SIZE, MAX_CU_SIZE, m_param->internalCsp);
     }
 
     const uint32_t numPartitions = 1 << (g_maxCUDepth << 1);
@@ -1894,6 +1896,7 @@ bool TEncSearch::predInterSearch(TComDat
     int      numPredDir = cu->m_slice->isInterP() ? 1 : 2;
     uint32_t lastMode = 0;
     int      totalmebits = 0;
+    TComYuv   m_predYuv[2];
 
     const int* numRefIdx = cu->m_slice->m_numRefIdx;
 
@@ -1901,6 +1904,9 @@ bool TEncSearch::predInterSearch(TComDat
 
     memset(&merge, 0, sizeof(merge));
 
+    m_predYuv[0].create(MAX_CU_SIZE, MAX_CU_SIZE, m_param->internalCsp);
+    m_predYuv[1].create(MAX_CU_SIZE, MAX_CU_SIZE, m_param->internalCsp);
+
     for (int partIdx = 0; partIdx < numPart; partIdx++)
     {
         uint32_t partAddr;
@@ -1936,7 +1942,7 @@ bool TEncSearch::predInterSearch(TComDat
                 cu->getCUMvField(REF_PIC_LIST_1)->setAllMvField(merge.mvField[1], partSize, partAddr, 0, partIdx);
                 totalmebits += merge.bits;
 
-                prepMotionCompensation(cu, partIdx);     
+                prepMotionCompensation(cu, partIdx);
                 motionCompensation(cu, predYuv, REF_PIC_LIST_X, true, bChroma);
                 continue;
             }
@@ -2159,6 +2165,9 @@ bool TEncSearch::predInterSearch(TComDat
         motionCompensation(cu, predYuv, REF_PIC_LIST_X, true, bChroma);
     }
 
+    m_predYuv[0].destroy();
+    m_predYuv[1].destroy();
+
     x265_emms();
     cu->m_totalBits = totalmebits;
     return true;
diff -r e85b0aaa64e4 -r 49bab9bdf2a3 source/Lib/TLibEncoder/TEncSearch.h
--- a/source/Lib/TLibEncoder/TEncSearch.h	Thu Jul 31 11:08:02 2014 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.h	Fri Aug 01 17:56:37 2014 -0700
@@ -106,6 +106,7 @@ public:
     MotionReference (*m_mref)[MAX_NUM_REF + 1];
 
     ShortYuv*       m_qtTempShortYuv;
+    TComYuv         m_predTempYuv;
 
     coeff_t*        m_qtTempCoeff[3][NUM_LAYERS];
     uint8_t*        m_qtTempTrIdx;
diff -r e85b0aaa64e4 -r 49bab9bdf2a3 source/common/dct.cpp
--- a/source/common/dct.cpp	Thu Jul 31 11:08:02 2014 +0530
+++ b/source/common/dct.cpp	Fri Aug 01 17:56:37 2014 -0700
@@ -830,6 +830,22 @@ int  count_nonzero_c(const int32_t *quan
 
     return count;
 }
+
+template<int trSize>
+uint32_t conv16to32_count(coeff_t* coeff, int16_t* residual, intptr_t stride)
+{
+    uint32_t numSig = 0;
+    for (int k = 0; k < trSize; k++)
+    {
+        for (int j = 0; j < trSize; j++)
+        {
+            coeff[k * trSize + j] = ((int16_t)residual[k * stride + j]);
+            numSig += (residual[k * stride + j] != 0);
+        }
+    }
+
+    return numSig;
+}
 }  // closing - anonymous file-static namespace
 
 namespace x265 {
@@ -852,5 +868,10 @@ void Setup_C_DCTPrimitives(EncoderPrimit
     p.idct[IDCT_16x16] = idct16_c;
     p.idct[IDCT_32x32] = idct32_c;
     p.count_nonzero = count_nonzero_c;
+
+    p.cvt16to32_cnt[BLOCK_4x4] = conv16to32_count<4>;
+    p.cvt16to32_cnt[BLOCK_8x8] = conv16to32_count<8>;
+    p.cvt16to32_cnt[BLOCK_16x16] = conv16to32_count<16>;
+    p.cvt16to32_cnt[BLOCK_32x32] = conv16to32_count<32>;
 }
 }
diff -r e85b0aaa64e4 -r 49bab9bdf2a3 source/common/primitives.h
--- a/source/common/primitives.h	Thu Jul 31 11:08:02 2014 +0530
+++ b/source/common/primitives.h	Fri Aug 01 17:56:37 2014 -0700
@@ -150,6 +150,7 @@ typedef void (*intra_allangs_t)(pixel *d
 
 typedef void (*cvt16to32_shl_t)(int32_t *dst, int16_t *src, intptr_t, int, int);
 typedef void (*cvt32to16_shr_t)(int16_t *dst, int32_t *src, intptr_t, int, int);
+typedef uint32_t (*cvt16to32_cnt_t)(coeff_t* coeff, int16_t* residual, intptr_t stride);
 
 typedef void (*dct_t)(int16_t *src, int32_t *dst, intptr_t stride);
 typedef void (*idct_t)(int32_t *src, int16_t *dst, intptr_t stride);
@@ -218,6 +219,7 @@ struct EncoderPrimitives
     blockcpy_ps_t   blockcpy_ps;                     // block copy pixel from short
     cvt16to32_shl_t cvt16to32_shl;
     cvt32to16_shr_t cvt32to16_shr;
+    cvt16to32_cnt_t cvt16to32_cnt[NUM_SQUARE_BLOCKS - 1];
 
     copy_pp_t       luma_copy_pp[NUM_LUMA_PARTITIONS];
     copy_sp_t       luma_copy_sp[NUM_LUMA_PARTITIONS];
diff -r e85b0aaa64e4 -r 49bab9bdf2a3 source/common/quant.cpp
--- a/source/common/quant.cpp	Thu Jul 31 11:08:02 2014 +0530
+++ b/source/common/quant.cpp	Fri Aug 01 17:56:37 2014 -0700
@@ -219,8 +219,7 @@ void Quant::setQPforQuant(int qpy, TextT
 uint32_t Quant::signBitHidingHDQ(coeff_t* qCoef, coeff_t* coef, int32_t* deltaU, uint32_t numSig, const TUEntropyCodingParameters &codingParameters)
 {
     const uint32_t log2TrSizeCG = codingParameters.log2TrSizeCG;
-
-    int lastCG = 1;
+    bool lastCG = true;
 
     for (int subSet = (1 << log2TrSizeCG * 2) - 1; subSet >= 0; subSet--)
     {
@@ -253,7 +252,7 @@ uint32_t Quant::signBitHidingHDQ(coeff_t
             {
                 int minCostInc = MAX_INT,  minPos = -1, finalChange = 0, curCost = MAX_INT, curChange = 0;
 
-                for (n = (lastCG == 1 ? lastNZPosInCG : SCAN_SET_SIZE - 1); n >= 0; --n)
+                for (n = (lastCG ? lastNZPosInCG : SCAN_SET_SIZE - 1); n >= 0; --n)
                 {
                     uint32_t blkPos = codingParameters.scan[n + subPos];
                     if (qCoef[blkPos])
@@ -317,7 +316,7 @@ uint32_t Quant::signBitHidingHDQ(coeff_t
             }
         }
 
-        lastCG = 0;
+        lastCG = false;
     }
 
     return numSig;
@@ -365,17 +364,8 @@ uint32_t Quant::transformNxN(TComDataCU*
     int trSize = 1 << log2TrSize;
     if (cu->getCUTransquantBypass(absPartIdx))
     {
-        uint32_t numSig = 0;
-        for (int k = 0; k < trSize; k++)
-        {
-            for (int j = 0; j < trSize; j++)
-            {
-                coeff[k * trSize + j] = ((int16_t)residual[k * stride + j]);
-                numSig += (residual[k * stride + j] != 0);
-            }
-        }
-
-        return numSig;
+        X265_CHECK(log2TrSize >= 2 && log2TrSize <= 5, "Block size mistake!\n");
+        return primitives.cvt16to32_cnt[log2TrSize - 2](coeff, residual, stride);
     }
 
     X265_CHECK((cu->m_slice->m_sps->quadtreeTULog2MaxSize >= log2TrSize), "transform size too large\n");
@@ -502,7 +492,6 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
     uint32_t trSize = 1 << log2TrSize;
     int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize; // Represents scaling through forward transform
     int scalingListType = (cu->isIntra(absPartIdx) ? 0 : 3) + ttype;
-    m_transformShift = transformShift;
 
     X265_CHECK(scalingListType < 6, "scaling list type out of range\n");
 
@@ -521,25 +510,31 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
         return 0;
 
     x265_emms();
-    selectLambda(ttype);
 
+    /* unquant constants for psy-rdoq */
+    int32_t *unquantScale = m_scalingList->m_dequantCoef[log2TrSize - 2][scalingListType][rem];
+    int unquantShift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
+    int unquantRound = 1 << (unquantShift - 1);
+    int scaleBits = SCALE_BITS - 2 * transformShift;
+
+    double lambda2 = m_lambdas[ttype];
     double *errScale = m_scalingList->m_errScale[log2TrSize - 2][scalingListType][rem];
     bool bIsLuma = ttype == TEXT_LUMA;
     bool usePsy = m_psyRdoqScale && bIsLuma;
 
-    double blockUncodedCost = 0;
-    double costCoeff[32 * 32];
-    double costSig[32 * 32];
-    double costCoeff0[32 * 32];
+    double totalUncodedCost = 0;
+    double costCoeff[32 * 32];   /* d*d + lambda * bits */
+    double costUncoded[32 * 32]; /* d*d + lambda * 0    */
+    double costSig[32 * 32];     /* lambda * bits       */
 
-    int rateIncUp[32 * 32];
-    int rateIncDown[32 * 32];
-    int sigRateDelta[32 * 32];
+    int rateIncUp[32 * 32];      /* signal overhead of increasing level */
+    int rateIncDown[32 * 32];    /* signal overhead of decreasing level */
+    int sigRateDelta[32 * 32];   /* signal difference between zero and non-zero */
     int deltaU[32 * 32];
 
-    const uint32_t cgSize = (1 << MLS_CG_SIZE); // 4x4 coef = 16
-    double costCoeffGroupSig[MLS_GRP_NUM];      // 32x32 has 64 4x4 coding groups
+    double   costCoeffGroupSig[MLS_GRP_NUM]; /* lambda * bits of group coding cost */
     uint64_t sigCoeffGroupFlag64 = 0;
+
     uint32_t ctxSet      = 0;
     int    c1            = 1;
     int    c2            = 0;
@@ -549,6 +544,7 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
     uint32_t c1Idx       = 0;
     uint32_t c2Idx       = 0;
     int cgLastScanPos    = -1;
+    const uint32_t cgSize = (1 << MLS_CG_SIZE); /* 4x4 num coef = 16 */
 
     TUEntropyCodingParameters codingParameters;
     cu->getTUEntropyCodingParameters(codingParameters, absPartIdx, log2TrSize, bIsLuma);
@@ -557,6 +553,7 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
     uint32_t scanPos;
     coeffGroupRDStats rdStats;
 
+    /* iterate over coding groups in reverse scan order */
     for (int cgScanPos = cgNum - 1; cgScanPos >= 0; cgScanPos--)
     {
         const uint32_t cgBlkPos = codingParameters.scanCG[cgScanPos];
@@ -567,24 +564,24 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
 
         const int patternSigCtx = calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
 
+        /* iterate over coefficients in each group in reverse scan order */
         for (int scanPosinCG = cgSize - 1; scanPosinCG >= 0; scanPosinCG--)
         {
             scanPos              = (cgScanPos << MLS_CG_SIZE) + scanPosinCG;
             uint32_t blkPos      = codingParameters.scan[scanPos];
-            double scaleFactor   = errScale[blkPos];
-            int levelDouble      = scaledCoeff[blkPos];    /* abs(coef) * quantCoef */
+            double scaleFactor   = errScale[blkPos];       /* (1 << scaleBits) / (quantCoef * quantCoef) */
+            int levelScaled      = scaledCoeff[blkPos];    /* abs(coef) * quantCoef */
             uint32_t maxAbsLevel = abs(dstCoeff[blkPos]);  /* abs(coef) */
 
-            /* initial cost of each coefficient. This works out to be:
-             *   abs(coef) * quantCoef * abs(coef) * quantCoef * (scalingBits / (quantCoef * quantCoef))
-             *   which reduces to abs(coef) * abs(coef) * scalingBits, which should be reduced
-             *   even further to abs(coef) * abs(coef) << scalingBits in the future */
-            costCoeff0[scanPos] = ((uint64_t)levelDouble * levelDouble) * scaleFactor;
+            /* RDOQ measures distortion as the scaled level squared times a
+             * scale factor which tries to remove the quantCoef back out, but
+             * adds scaleBits to account for IEP_RATE which is 32k (1 << SCALE_BITS) */
 
-            /* running total of initial coeff L2 cost without accounting for lambda */
-            blockUncodedCost   += costCoeff0[scanPos];
+            /* cost of not coding this coefficient (no signal bits) */
+            costUncoded[scanPos] = ((uint64_t)levelScaled * levelScaled) * scaleFactor;
+            totalUncodedCost += costUncoded[scanPos];
 
-            if (maxAbsLevel > 0 && lastScanPos < 0)
+            if (maxAbsLevel && lastScanPos < 0)
             {
                 /* remember the first non-zero coef found in this reverse scan as the last pos */
                 lastScanPos   = scanPos;
@@ -592,7 +589,15 @@ uint32_t Quant::rdoQuant(TComDataCU* cu,
                 cgLastScanPos = cgScanPos;
             }
 
-            if (lastScanPos >= 0)
+            if (lastScanPos < 0)
+            {
+                /* No non-zero coefficient yet found, but this does not mean
+                 * there is no uncoded-cost for this coefficient. Pre-
+                 * quantization the coefficient may have been non-zero */
+                costCoeff[scanPos] = 0;
+                baseCost += costUncoded[scanPos];
+            }
+            else
             {
                 const uint32_t c1c2Idx = ((c1Idx - 8) >> (sizeof(int) * CHAR_BIT - 1)) + (((-(int)c2Idx) >> (sizeof(int) * CHAR_BIT - 1)) + 1) * 2;