[x265-commits] [x265] analysis: RDO based BIDIR decisions

Tue Nov 11 20:31:26 CET 2014

details:   http://hg.videolan.org/x265/rev/4c6c28cc93d9
branches:  
changeset: 8817:4c6c28cc93d9
user:      Steve Borho <steve at borho.org>
date:      Sat Nov 08 16:10:53 2014 -0600
description:
analysis: RDO based BIDIR decisions

At RD 0, 1, and 2, this changes 2Nx2N bidir from a SATD decision to an SA8D
decision.

At RD 3 and 4, if the bidir SA8D cost is within 17/16 of the best inter cost,
then it makes an RDO decision between bestInter and Bidir (allowing psy-rd to
influence the decision, which is the whole point)

At RD 5 and 6, 2Nx2N BIDIR is yet another RD choice at the same level as 2Nx2N
inter and rect and amp. (psy) RDO picks the best mode for each block.
Subject: [x265] rdcost: experimental slice-type based psy-rd scale factor

details:   http://hg.videolan.org/x265/rev/4f3fd7ab8868
branches:  
changeset: 8818:4f3fd7ab8868
user:      Steve Borho <steve at borho.org>
date:      Sun Nov 09 19:34:01 2014 -0600
description:
rdcost: experimental slice-type based psy-rd scale factor
Subject: [x265] param: remove --b-intra from --tune grain, document rdoq restriction

details:   http://hg.videolan.org/x265/rev/64ccc616be33
branches:  
changeset: 8819:64ccc616be33
user:      Steve Borho <steve at borho.org>
date:      Mon Nov 10 12:27:56 2014 -0600
description:
param: remove --b-intra from --tune grain, document rdoq restriction
Subject: [x265] param: raise --nr limit to 2000

details:   http://hg.videolan.org/x265/rev/27f293dd9eee
branches:  
changeset: 8820:27f293dd9eee
user:      Steve Borho <steve at borho.org>
date:      Mon Nov 10 12:57:00 2014 -0600
description:
param: raise --nr limit to 2000
Subject: [x265] noiseReduction: apply only for I and P, move NoiseReduction to quant.h

details:   http://hg.videolan.org/x265/rev/ed89e58b44e8
branches:  
changeset: 8821:ed89e58b44e8
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Nov 06 14:06:20 2014 +0530
description:
noiseReduction: apply only for I and P, move NoiseReduction to quant.h

This doubles the number of quant nr categories; intra blocks now use the lower
half.
Subject: [x265] quant: allow --nr in all slice types evenly

details:   http://hg.videolan.org/x265/rev/38fa64a5c51c
branches:  
changeset: 8822:38fa64a5c51c
user:      Steve Borho <steve at borho.org>
date:      Mon Nov 10 14:07:51 2014 -0600
description:
quant: allow --nr in all slice types evenly
Subject: [x265] analysis: Dump best MV statistics and re-use this for analysis load mode

details:   http://hg.videolan.org/x265/rev/c8004323493e
branches:  
changeset: 8823:c8004323493e
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Tue Nov 11 11:27:02 2014 +0530
description:
analysis: Dump best MV statistics and re-use this for analysis load mode

This patch fixes a bug in inter slices in analysis=load|save mode. Inter data
for all partitions is now saved correctly.
Subject: [x265] x265: remove redundant variables from intra and inter analysis structure

details:   http://hg.videolan.org/x265/rev/ad5177c86756
branches:  
changeset: 8824:ad5177c86756
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Tue Nov 11 11:51:24 2014 +0530
description:
x265: remove redundant variables from intra and inter analysis structure
Subject: [x265] param: add default value to analysis mode

details:   http://hg.videolan.org/x265/rev/5c397e744cfd
branches:  
changeset: 8825:5c397e744cfd
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Nov 11 14:10:22 2014 +0530
description:
param: add default value to analysis mode
Subject: [x265] x265: create and initialise recon object if analysis mode is enabled

details:   http://hg.videolan.org/x265/rev/47b290236ca3
branches:  
changeset: 8826:47b290236ca3
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Nov 11 14:10:48 2014 +0530
description:
x265: create and initialise recon object if analysis mode is enabled
Subject: [x265] api: replace analysis data with pre defined constant

details:   http://hg.videolan.org/x265/rev/b4effa4dd53b
branches:  
changeset: 8827:b4effa4dd53b
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Nov 11 14:11:02 2014 +0530
description:
api: replace analysis data with pre defined constant
Subject: [x265] api: cleanup

details:   http://hg.videolan.org/x265/rev/3c01e8881946
branches:  
changeset: 8828:3c01e8881946
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Nov 11 14:13:27 2014 +0530
description:
api: cleanup
Subject: [x265] x265: more meaningful error messages in analysis

details:   http://hg.videolan.org/x265/rev/838e41fb256b
branches:  
changeset: 8829:838e41fb256b
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Tue Nov 11 14:40:47 2014 +0530
description:
x265: more meaningful error messages in analysis
Subject: [x265] Merge

details:   http://hg.videolan.org/x265/rev/fa2fedd97ff2
branches:  
changeset: 8830:fa2fedd97ff2
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 11 12:34:29 2014 -0600
description:
Merge
Subject: [x265] analysis: fix bidir non-determinism in --pmode --rd 5

details:   http://hg.videolan.org/x265/rev/306ef9782a30
branches:  
changeset: 8831:306ef9782a30
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 11 13:29:36 2014 -0600
description:
analysis: fix bidir non-determinism in --pmode --rd 5

diffstat:

 doc/reST/cli.rst            |    2 +-
 doc/reST/presets.rst        |    6 +-
 source/common/common.h      |   13 --
 source/common/param.cpp     |    4 +-
 source/common/quant.cpp     |    4 +-
 source/common/quant.h       |   14 ++
 source/encoder/analysis.cpp |  245 ++++++++++++++++++++++++++++++++++++-------
 source/encoder/analysis.h   |    3 +
 source/encoder/api.cpp      |   12 +-
 source/encoder/rdcost.h     |    7 +-
 source/encoder/search.cpp   |  150 +++++++++++++-------------
 source/encoder/search.h     |    4 +-
 source/x265.cpp             |   33 ++---
 source/x265.h               |    6 +-
 14 files changed, 334 insertions(+), 169 deletions(-)

diffs (truncated from 1047 to 300 lines):

diff -r 32513a4c3bd4 -r 306ef9782a30 doc/reST/cli.rst

--- a/doc/reST/cli.rst	Mon Nov 10 12:39:54 2014 +0900
+++ b/doc/reST/cli.rst	Tue Nov 11 13:29:36 2014 -0600
@@ -255,7 +255,7 @@ Input Options
 	numbers of frame threads. Outputs will be deterministic but the
 	outputs of -F2 will no longer match the outputs of -F3, etc.
 
-	**Values:** any value in range of 100 to 1000. Default disabled.
+	**Values:** any value in range of 100 to 2000. Default disabled.
 
 .. option:: --input-res <wxh>
 
diff -r 32513a4c3bd4 -r 306ef9782a30 doc/reST/presets.rst
--- a/doc/reST/presets.rst	Mon Nov 10 12:39:54 2014 +0900
+++ b/doc/reST/presets.rst	Tue Nov 11 13:29:36 2014 -0600
@@ -114,7 +114,11 @@ select modes which preserve high frequen
 
     * :option:`--psy-rd` 0.5
     * :option:`--psy-rdoq` 30
-    * :option:`--b-intra`
+
+.. Note::
+
+    --psy-rdoq is only effective when RDOQuant is enabled, which is at
+    RD levels 4, 5, and 6 (presets slow and below).
 
 It lowers the strength of adaptive quantization, so residual energy can
 be more evenly distributed across the (noisy) picture:
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/common.h
--- a/source/common/common.h	Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/common.h	Tue Nov 11 13:29:36 2014 -0600
@@ -245,9 +245,6 @@ typedef int16_t  coeff_t;      // transf
 #define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE)
 #define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE)
 
-#define MAX_NUM_TR_COEFFS        MAX_TR_SIZE * MAX_TR_SIZE /* Maximum number of transform coefficients, for a 32x32 transform */
-#define MAX_NUM_TR_CATEGORIES    8                         /* 32, 16, 8, 4 transform categories each for luma and chroma */
-
 #define COEF_REMAIN_BIN_REDUCTION   3 // indicates the level at which the VLC
                                       // transitions from Golomb-Rice to TU+EG(k)
 
@@ -302,16 +299,6 @@ namespace x265 {
 
 enum { SAO_NUM_OFFSET = 4 };
 
-// NOTE: MUST be alignment to 16 or 32 bytes for asm code
-struct NoiseReduction
-{
-    /* 0 = luma 4x4, 1 = luma 8x8, 2 = luma 16x16, 3 = luma 32x32
-     * 4 = chroma 4x4, 5 = chroma 8x8, 6 = chroma 16x16, 7 = chroma 32x32 */
-    uint16_t offsetDenoise[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
-    uint32_t residualSum[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
-    uint32_t count[MAX_NUM_TR_CATEGORIES];
-};
-
 enum SaoMergeMode
 {
     SAO_MERGE_NONE,
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/param.cpp
--- a/source/common/param.cpp	Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/param.cpp	Tue Nov 11 13:29:36 2014 -0600
@@ -176,6 +176,7 @@ void x265_param_default(x265_param *para
     param->rdPenalty = 0;
     param->psyRd = 0.0;
     param->psyRdoq = 0.0;
+    param->analysisMode = 0;
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
     param->bCULossless = 0;
@@ -412,7 +413,6 @@ int x265_param_default_preset(x265_param
             param->deblockingFilterTCOffset = -2;
             param->psyRdoq = 30;
             param->psyRd = 0.5;
-            param->bIntraInBFrames = true;
             param->rc.ipFactor = 1.1;
             param->rc.pbFactor = 1.1;
             param->rc.aqMode = X265_AQ_VARIANCE;
@@ -1071,7 +1071,7 @@ int x265_check_params(x265_param *param)
     CHECK(param->rc.qCompress < 0.5 || param->rc.qCompress > 1.0,
           "qCompress must be between 0.5 and 1.0");
     if (param->noiseReduction)
-        CHECK(100 > param->noiseReduction || param->noiseReduction > 1000, "Valid noise reduction range 100 - 1000");
+        CHECK(100 > param->noiseReduction || param->noiseReduction > 2000, "Valid noise reduction range 100 - 1000");
     CHECK(param->rc.rateControlMode == X265_RC_CRF && param->rc.bStatRead,
           "Constant rate-factor is incompatible with 2pass");
     CHECK(param->rc.rateControlMode == X265_RC_CQP && param->rc.bStatRead,
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/quant.cpp
--- a/source/common/quant.cpp	Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/quant.cpp	Tue Nov 11 13:29:36 2014 -0600
@@ -370,10 +370,10 @@ uint32_t Quant::transformNxN(CUData& cu,
             primitives.dct[index](m_fencShortBuf, m_fencDctCoeff, trSize);
         }
 
-        if (m_nr && !isIntra)
+        if (m_nr)
         {
             /* denoise is not applied to intra residual, so DST can be ignored */
-            int cat = sizeIdx + 4 * !isLuma;
+            int cat = sizeIdx + 4 * !isLuma + 8 * !isIntra;
             int numCoeff = 1 << (log2TrSize * 2);
             primitives.denoiseDct(m_resiDctCoeff, m_nr->residualSum[cat], m_nr->offsetDenoise[cat], numCoeff);
             m_nr->count[cat]++;
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/quant.h
--- a/source/common/quant.h	Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/quant.h	Tue Nov 11 13:29:36 2014 -0600
@@ -58,6 +58,20 @@ struct QpParam
     }
 };
 
+#define MAX_NUM_TR_COEFFS        MAX_TR_SIZE * MAX_TR_SIZE /* Maximum number of transform coefficients, for a 32x32 transform */
+#define MAX_NUM_TR_CATEGORIES    16                        /* 32, 16, 8, 4 transform categories each for luma and chroma */
+
+// NOTE: MUST be 16-byte aligned for asm code
+struct NoiseReduction
+{
+    /* 0 = luma 4x4,   1 = luma 8x8,   2 = luma 16x16,   3 = luma 32x32
+     * 4 = chroma 4x4, 5 = chroma 8x8, 6 = chroma 16x16, 7 = chroma 32x32
+     * Intra 0..7 - Inter 8..15 */
+    uint16_t offsetDenoise[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
+    uint32_t residualSum[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
+    uint32_t count[MAX_NUM_TR_CATEGORIES];
+};
+
 class Quant
 {
 protected:
diff -r 32513a4c3bd4 -r 306ef9782a30 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp	Mon Nov 10 12:39:54 2014 +0900
+++ b/source/encoder/analysis.cpp	Tue Nov 11 13:29:36 2014 -0600
@@ -142,8 +142,6 @@ Mode& Analysis::compressCTU(CUData& ctu,
                 memcpy(&m_frame->m_intraData->depth[ctu.m_cuAddr * numPartition], bestCU->m_cuDepth, sizeof(uint8_t) * numPartition);
                 memcpy(&m_frame->m_intraData->modes[ctu.m_cuAddr * numPartition], bestCU->m_lumaIntraDir, sizeof(uint8_t) * numPartition);
                 memcpy(&m_frame->m_intraData->partSizes[ctu.m_cuAddr * numPartition], bestCU->m_partSize, sizeof(uint8_t) * numPartition);
-                m_frame->m_intraData->cuAddr[ctu.m_cuAddr] = ctu.m_cuAddr;
-                m_frame->m_intraData->poc[ctu.m_cuAddr] = m_frame->m_poc;
             }
         }
     }
@@ -399,6 +397,8 @@ void Analysis::parallelModeAnalysis(int 
 
         case 1:
             slave->checkInter_rd0_4(md.pred[PRED_2Nx2N], *m_curGeom, SIZE_2Nx2N);
+            if (m_slice->m_sliceType == B_SLICE)
+                slave->checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], *m_curGeom);
             break;
 
         case 2:
@@ -449,6 +449,13 @@ void Analysis::parallelModeAnalysis(int 
 
         case 1:
             slave->checkInter_rd5_6(md.pred[PRED_2Nx2N], *m_curGeom, SIZE_2Nx2N, false);
+            md.pred[PRED_BIDIR].rdCost = MAX_INT64;
+            if (m_slice->m_sliceType == B_SLICE)
+            {
+                slave->checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], *m_curGeom);
+                if (md.pred[PRED_BIDIR].sa8dCost < MAX_INT64)
+                    slave->encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], *m_curGeom);
+            }
             break;
 
         case 2:
@@ -504,6 +511,7 @@ void Analysis::compressInterCU_dist(cons
 
         /* Initialize all prediction CUs based on parentCTU */
         md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom);
+        md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
         md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom);
         md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom);
         if (m_param->bEnableRectInter)
@@ -595,16 +603,22 @@ void Analysis::compressInterCU_dist(cons
 
             if (m_param->rdLevel > 2)
             {
-                /* encode best inter */
+                /* RD selection between merge, inter, bidir and intra */
                 for (uint32_t puIdx = 0; puIdx < bestInter->cu.getNumPartInter(); puIdx++)
                 {
                     prepMotionCompensation(bestInter->cu, cuGeom, puIdx);
                     motionCompensation(bestInter->predYuv, false, true);
                 }
                 encodeResAndCalcRdInterCU(*bestInter, cuGeom);
+                checkBestMode(*bestInter, depth);
 
-                /* RD selection between merge, inter and intra */
-                checkBestMode(*bestInter, depth);
+                /* If BIDIR is available and within 17/16 of best inter option, choose by RDO */
+                if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&
+                    md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17)
+                {
+                    encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+                    checkBestMode(md.pred[PRED_BIDIR], depth);
+                }
 
 #if MATCH_NON_PMODE
                 if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) || md.bestMode->sa8dCost == MAX_INT64)
@@ -618,6 +632,9 @@ void Analysis::compressInterCU_dist(cons
                 if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)
                     md.bestMode = bestInter;
 
+                if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)
+                    md.bestMode = &md.pred[PRED_BIDIR];
+
                 if (bTryIntra && md.pred[PRED_INTRA].sa8dCost < md.bestMode->sa8dCost)
                 {
                     md.bestMode = &md.pred[PRED_INTRA];
@@ -641,6 +658,7 @@ void Analysis::compressInterCU_dist(cons
             m_modeCompletionEvent.wait();
 
             checkBestMode(md.pred[PRED_2Nx2N], depth);
+            checkBestMode(md.pred[PRED_BIDIR], depth);
 
             if (m_param->bEnableRectInter)
             {
@@ -790,8 +808,14 @@ void Analysis::compressInterCU_rd0_4(con
         {
             md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom);
             checkInter_rd0_4(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N);
+
+            if (m_slice->m_sliceType == B_SLICE)
+            {
+                md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
+                checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);
+            }
+
             Mode *bestInter = &md.pred[PRED_2Nx2N];
-
             if (m_param->bEnableRectInter)
             {
                 md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom);
@@ -853,11 +877,16 @@ void Analysis::compressInterCU_rd0_4(con
                     prepMotionCompensation(bestInter->cu, cuGeom, puIdx);
                     motionCompensation(bestInter->predYuv, false, true);
                 }
+                encodeResAndCalcRdInterCU(*bestInter, cuGeom);
+                checkBestMode(*bestInter, depth);
 
-                encodeResAndCalcRdInterCU(*bestInter, cuGeom);
-
-                if (!md.bestMode || bestInter->rdCost < md.bestMode->rdCost)
-                    md.bestMode = bestInter;
+                /* If BIDIR is available and within 17/16 of best inter option, choose by RDO */
+                if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&
+                    md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17)
+                {
+                    encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+                    checkBestMode(md.pred[PRED_BIDIR], depth);
+                }
 
                 if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) ||
                     md.bestMode->sa8dCost == MAX_INT64)
@@ -865,16 +894,19 @@ void Analysis::compressInterCU_rd0_4(con
                     md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom);
                     checkIntraInInter(md.pred[PRED_INTRA], cuGeom);
                     encodeIntraInInter(md.pred[PRED_INTRA], cuGeom);
-                    if (md.pred[PRED_INTRA].rdCost < md.bestMode->rdCost)
-                        md.bestMode = &md.pred[PRED_INTRA];
+                    checkBestMode(md.pred[PRED_INTRA], depth);
                 }
             }
             else
             {
-                /* SA8D choice between merge/skip, inter, and intra */
+                /* SA8D choice between merge/skip, inter, bidir, and intra */
                 if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)
                     md.bestMode = bestInter;
 
+                if (m_slice->m_sliceType == B_SLICE &&
+                    md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)
+                    md.bestMode = &md.pred[PRED_BIDIR];
+
                 if (bTryIntra || md.bestMode->sa8dCost == MAX_INT64)
                 {
                     md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom);
@@ -1052,9 +1084,19 @@ void Analysis::compressInterCU_rd5_6(con
             checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, false);
             checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
 
+            if (m_slice->m_sliceType == B_SLICE)
+            {
+                md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
+                checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);
+                if (md.pred[PRED_BIDIR].sa8dCost < MAX_INT64)
+                {
+                    encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+                    checkBestMode(md.pred[PRED_BIDIR], cuGeom.depth);
+                }
+            }
+
             if (m_param->bEnableRectInter)
             {
-                // Nx2N rect
                 if (!m_param->bEnableCbfFastMode || md.bestMode->cu.getQtRootCbf(0))
                 {
                     md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom);
@@ -1407,12 +1449,17 @@ void Analysis::checkInter_rd0_4(Mode& in
 
     if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_interAnalysisData)
     {
-        for (int32_t i = 0; i < numPredDir; i++)
+        for (uint32_t part = 0; part < interMode.cu.getNumPartInter(); part++)