[x265-commits] [x265] analysis: RDO based BIDIR decisions
Steve Borho
steve at borho.org
Tue Nov 11 20:31:26 CET 2014
details: http://hg.videolan.org/x265/rev/4c6c28cc93d9
branches:
changeset: 8817:4c6c28cc93d9
user: Steve Borho <steve at borho.org>
date: Sat Nov 08 16:10:53 2014 -0600
description:
analysis: RDO based BIDIR decisions
At RD 0, 1, and 2, this changes 2Nx2N bidir from a SATD decision to an SA8D
decision.
At RD 3 and 4, if the bidir SA8D cost is within 17/16 of the best inter cost,
then it makes an RDO decision between bestInter and Bidir (allowing psy-rd to
influence the decision, which is the whole point)
At RD 5 and 6, 2Nx2N BIDIR is yet another RD choice at the same level as 2Nx2N
inter and rect and amp. (psy) RDO picks the best mode for each block.
Subject: [x265] rdcost: experimental slice-type based psy-rd scale factor
details: http://hg.videolan.org/x265/rev/4f3fd7ab8868
branches:
changeset: 8818:4f3fd7ab8868
user: Steve Borho <steve at borho.org>
date: Sun Nov 09 19:34:01 2014 -0600
description:
rdcost: experimental slice-type based psy-rd scale factor
Subject: [x265] param: remove --b-intra from --tune grain, document rdoq restriction
details: http://hg.videolan.org/x265/rev/64ccc616be33
branches:
changeset: 8819:64ccc616be33
user: Steve Borho <steve at borho.org>
date: Mon Nov 10 12:27:56 2014 -0600
description:
param: remove --b-intra from --tune grain, document rdoq restriction
Subject: [x265] param: raise --nr limit to 2000
details: http://hg.videolan.org/x265/rev/27f293dd9eee
branches:
changeset: 8820:27f293dd9eee
user: Steve Borho <steve at borho.org>
date: Mon Nov 10 12:57:00 2014 -0600
description:
param: raise --nr limit to 2000
Subject: [x265] noiseReduction: apply only for I and P, move NoiseReduction to quant.h
details: http://hg.videolan.org/x265/rev/ed89e58b44e8
branches:
changeset: 8821:ed89e58b44e8
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Nov 06 14:06:20 2014 +0530
description:
noiseReduction: apply only for I and P, move NoiseReduction to quant.h
This doubles the number of quant nr categories; intra blocks now use the lower
half.
Subject: [x265] quant: allow --nr in all slice types evenly
details: http://hg.videolan.org/x265/rev/38fa64a5c51c
branches:
changeset: 8822:38fa64a5c51c
user: Steve Borho <steve at borho.org>
date: Mon Nov 10 14:07:51 2014 -0600
description:
quant: allow --nr in all slice types evenly
Subject: [x265] analysis: Dump best MV statistics and re-use this for analysis load mode
details: http://hg.videolan.org/x265/rev/c8004323493e
branches:
changeset: 8823:c8004323493e
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Tue Nov 11 11:27:02 2014 +0530
description:
analysis: Dump best MV statistics and re-use this for analysis load mode
This patch fixes a bug in inter slices in analysis=load|save mode. Inter data
for all partitions is now saved correctly.
Subject: [x265] x265: remove redundant variables from intra and inter analysis structure
details: http://hg.videolan.org/x265/rev/ad5177c86756
branches:
changeset: 8824:ad5177c86756
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Tue Nov 11 11:51:24 2014 +0530
description:
x265: remove redundant variables from intra and inter analysis structure
Subject: [x265] param: add default value to analysis mode
details: http://hg.videolan.org/x265/rev/5c397e744cfd
branches:
changeset: 8825:5c397e744cfd
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 11 14:10:22 2014 +0530
description:
param: add default value to analysis mode
Subject: [x265] x265: create and initialise recon object if analysis mode is enabled
details: http://hg.videolan.org/x265/rev/47b290236ca3
branches:
changeset: 8826:47b290236ca3
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 11 14:10:48 2014 +0530
description:
x265: create and initialise recon object if analysis mode is enabled
Subject: [x265] api: replace analysis data with pre defined constant
details: http://hg.videolan.org/x265/rev/b4effa4dd53b
branches:
changeset: 8827:b4effa4dd53b
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 11 14:11:02 2014 +0530
description:
api: replace analysis data with pre defined constant
Subject: [x265] api: cleanup
details: http://hg.videolan.org/x265/rev/3c01e8881946
branches:
changeset: 8828:3c01e8881946
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 11 14:13:27 2014 +0530
description:
api: cleanup
Subject: [x265] x265: more meaningful error messages in analysis
details: http://hg.videolan.org/x265/rev/838e41fb256b
branches:
changeset: 8829:838e41fb256b
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 11 14:40:47 2014 +0530
description:
x265: more meaningful error messages in analysis
Subject: [x265] Merge
details: http://hg.videolan.org/x265/rev/fa2fedd97ff2
branches:
changeset: 8830:fa2fedd97ff2
user: Steve Borho <steve at borho.org>
date: Tue Nov 11 12:34:29 2014 -0600
description:
Merge
Subject: [x265] analysis: fix bidir non-determinism in --pmode --rd 5
details: http://hg.videolan.org/x265/rev/306ef9782a30
branches:
changeset: 8831:306ef9782a30
user: Steve Borho <steve at borho.org>
date: Tue Nov 11 13:29:36 2014 -0600
description:
analysis: fix bidir non-determinism in --pmode --rd 5
diffstat:
doc/reST/cli.rst | 2 +-
doc/reST/presets.rst | 6 +-
source/common/common.h | 13 --
source/common/param.cpp | 4 +-
source/common/quant.cpp | 4 +-
source/common/quant.h | 14 ++
source/encoder/analysis.cpp | 245 ++++++++++++++++++++++++++++++++++++-------
source/encoder/analysis.h | 3 +
source/encoder/api.cpp | 12 +-
source/encoder/rdcost.h | 7 +-
source/encoder/search.cpp | 150 +++++++++++++-------------
source/encoder/search.h | 4 +-
source/x265.cpp | 33 ++---
source/x265.h | 6 +-
14 files changed, 334 insertions(+), 169 deletions(-)
diffs (truncated from 1047 to 300 lines):
diff -r 32513a4c3bd4 -r 306ef9782a30 doc/reST/cli.rst
--- a/doc/reST/cli.rst Mon Nov 10 12:39:54 2014 +0900
+++ b/doc/reST/cli.rst Tue Nov 11 13:29:36 2014 -0600
@@ -255,7 +255,7 @@ Input Options
numbers of frame threads. Outputs will be deterministic but the
outputs of -F2 will no longer match the outputs of -F3, etc.
- **Values:** any value in range of 100 to 1000. Default disabled.
+ **Values:** any value in range of 100 to 2000. Default disabled.
.. option:: --input-res <wxh>
diff -r 32513a4c3bd4 -r 306ef9782a30 doc/reST/presets.rst
--- a/doc/reST/presets.rst Mon Nov 10 12:39:54 2014 +0900
+++ b/doc/reST/presets.rst Tue Nov 11 13:29:36 2014 -0600
@@ -114,7 +114,11 @@ select modes which preserve high frequen
* :option:`--psy-rd` 0.5
* :option:`--psy-rdoq` 30
- * :option:`--b-intra`
+
+.. Note::
+
+ --psy-rdoq is only effective when RDOQuant is enabled, which is at
+ RD levels 4, 5, and 6 (presets slow and below).
It lowers the strength of adaptive quantization, so residual energy can
be more evenly distributed across the (noisy) picture:
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/common.h
--- a/source/common/common.h Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/common.h Tue Nov 11 13:29:36 2014 -0600
@@ -245,9 +245,6 @@ typedef int16_t coeff_t; // transf
#define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE)
#define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE)
-#define MAX_NUM_TR_COEFFS MAX_TR_SIZE * MAX_TR_SIZE /* Maximum number of transform coefficients, for a 32x32 transform */
-#define MAX_NUM_TR_CATEGORIES 8 /* 32, 16, 8, 4 transform categories each for luma and chroma */
-
#define COEF_REMAIN_BIN_REDUCTION 3 // indicates the level at which the VLC
// transitions from Golomb-Rice to TU+EG(k)
@@ -302,16 +299,6 @@ namespace x265 {
enum { SAO_NUM_OFFSET = 4 };
-// NOTE: MUST be alignment to 16 or 32 bytes for asm code
-struct NoiseReduction
-{
- /* 0 = luma 4x4, 1 = luma 8x8, 2 = luma 16x16, 3 = luma 32x32
- * 4 = chroma 4x4, 5 = chroma 8x8, 6 = chroma 16x16, 7 = chroma 32x32 */
- uint16_t offsetDenoise[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
- uint32_t residualSum[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
- uint32_t count[MAX_NUM_TR_CATEGORIES];
-};
-
enum SaoMergeMode
{
SAO_MERGE_NONE,
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/param.cpp
--- a/source/common/param.cpp Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/param.cpp Tue Nov 11 13:29:36 2014 -0600
@@ -176,6 +176,7 @@ void x265_param_default(x265_param *para
param->rdPenalty = 0;
param->psyRd = 0.0;
param->psyRdoq = 0.0;
+ param->analysisMode = 0;
param->bIntraInBFrames = 0;
param->bLossless = 0;
param->bCULossless = 0;
@@ -412,7 +413,6 @@ int x265_param_default_preset(x265_param
param->deblockingFilterTCOffset = -2;
param->psyRdoq = 30;
param->psyRd = 0.5;
- param->bIntraInBFrames = true;
param->rc.ipFactor = 1.1;
param->rc.pbFactor = 1.1;
param->rc.aqMode = X265_AQ_VARIANCE;
@@ -1071,7 +1071,7 @@ int x265_check_params(x265_param *param)
CHECK(param->rc.qCompress < 0.5 || param->rc.qCompress > 1.0,
"qCompress must be between 0.5 and 1.0");
if (param->noiseReduction)
- CHECK(100 > param->noiseReduction || param->noiseReduction > 1000, "Valid noise reduction range 100 - 1000");
+ CHECK(100 > param->noiseReduction || param->noiseReduction > 2000, "Valid noise reduction range 100 - 1000");
CHECK(param->rc.rateControlMode == X265_RC_CRF && param->rc.bStatRead,
"Constant rate-factor is incompatible with 2pass");
CHECK(param->rc.rateControlMode == X265_RC_CQP && param->rc.bStatRead,
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/quant.cpp
--- a/source/common/quant.cpp Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/quant.cpp Tue Nov 11 13:29:36 2014 -0600
@@ -370,10 +370,10 @@ uint32_t Quant::transformNxN(CUData& cu,
primitives.dct[index](m_fencShortBuf, m_fencDctCoeff, trSize);
}
- if (m_nr && !isIntra)
+ if (m_nr)
{
/* denoise is not applied to intra residual, so DST can be ignored */
- int cat = sizeIdx + 4 * !isLuma;
+ int cat = sizeIdx + 4 * !isLuma + 8 * !isIntra;
int numCoeff = 1 << (log2TrSize * 2);
primitives.denoiseDct(m_resiDctCoeff, m_nr->residualSum[cat], m_nr->offsetDenoise[cat], numCoeff);
m_nr->count[cat]++;
diff -r 32513a4c3bd4 -r 306ef9782a30 source/common/quant.h
--- a/source/common/quant.h Mon Nov 10 12:39:54 2014 +0900
+++ b/source/common/quant.h Tue Nov 11 13:29:36 2014 -0600
@@ -58,6 +58,20 @@ struct QpParam
}
};
+#define MAX_NUM_TR_COEFFS MAX_TR_SIZE * MAX_TR_SIZE /* Maximum number of transform coefficients, for a 32x32 transform */
+#define MAX_NUM_TR_CATEGORIES 16 /* 32, 16, 8, 4 transform categories each for luma and chroma */
+
+// NOTE: MUST be 16-byte aligned for asm code
+struct NoiseReduction
+{
+ /* 0 = luma 4x4, 1 = luma 8x8, 2 = luma 16x16, 3 = luma 32x32
+ * 4 = chroma 4x4, 5 = chroma 8x8, 6 = chroma 16x16, 7 = chroma 32x32
+ * Intra 0..7 - Inter 8..15 */
+ uint16_t offsetDenoise[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
+ uint32_t residualSum[MAX_NUM_TR_CATEGORIES][MAX_NUM_TR_COEFFS];
+ uint32_t count[MAX_NUM_TR_CATEGORIES];
+};
+
class Quant
{
protected:
diff -r 32513a4c3bd4 -r 306ef9782a30 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp Mon Nov 10 12:39:54 2014 +0900
+++ b/source/encoder/analysis.cpp Tue Nov 11 13:29:36 2014 -0600
@@ -142,8 +142,6 @@ Mode& Analysis::compressCTU(CUData& ctu,
memcpy(&m_frame->m_intraData->depth[ctu.m_cuAddr * numPartition], bestCU->m_cuDepth, sizeof(uint8_t) * numPartition);
memcpy(&m_frame->m_intraData->modes[ctu.m_cuAddr * numPartition], bestCU->m_lumaIntraDir, sizeof(uint8_t) * numPartition);
memcpy(&m_frame->m_intraData->partSizes[ctu.m_cuAddr * numPartition], bestCU->m_partSize, sizeof(uint8_t) * numPartition);
- m_frame->m_intraData->cuAddr[ctu.m_cuAddr] = ctu.m_cuAddr;
- m_frame->m_intraData->poc[ctu.m_cuAddr] = m_frame->m_poc;
}
}
}
@@ -399,6 +397,8 @@ void Analysis::parallelModeAnalysis(int
case 1:
slave->checkInter_rd0_4(md.pred[PRED_2Nx2N], *m_curGeom, SIZE_2Nx2N);
+ if (m_slice->m_sliceType == B_SLICE)
+ slave->checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], *m_curGeom);
break;
case 2:
@@ -449,6 +449,13 @@ void Analysis::parallelModeAnalysis(int
case 1:
slave->checkInter_rd5_6(md.pred[PRED_2Nx2N], *m_curGeom, SIZE_2Nx2N, false);
+ md.pred[PRED_BIDIR].rdCost = MAX_INT64;
+ if (m_slice->m_sliceType == B_SLICE)
+ {
+ slave->checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], *m_curGeom);
+ if (md.pred[PRED_BIDIR].sa8dCost < MAX_INT64)
+ slave->encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], *m_curGeom);
+ }
break;
case 2:
@@ -504,6 +511,7 @@ void Analysis::compressInterCU_dist(cons
/* Initialize all prediction CUs based on parentCTU */
md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom);
+ md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom);
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom);
if (m_param->bEnableRectInter)
@@ -595,16 +603,22 @@ void Analysis::compressInterCU_dist(cons
if (m_param->rdLevel > 2)
{
- /* encode best inter */
+ /* RD selection between merge, inter, bidir and intra */
for (uint32_t puIdx = 0; puIdx < bestInter->cu.getNumPartInter(); puIdx++)
{
prepMotionCompensation(bestInter->cu, cuGeom, puIdx);
motionCompensation(bestInter->predYuv, false, true);
}
encodeResAndCalcRdInterCU(*bestInter, cuGeom);
+ checkBestMode(*bestInter, depth);
- /* RD selection between merge, inter and intra */
- checkBestMode(*bestInter, depth);
+ /* If BIDIR is available and within 17/16 of best inter option, choose by RDO */
+ if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&
+ md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17)
+ {
+ encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+ checkBestMode(md.pred[PRED_BIDIR], depth);
+ }
#if MATCH_NON_PMODE
if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) || md.bestMode->sa8dCost == MAX_INT64)
@@ -618,6 +632,9 @@ void Analysis::compressInterCU_dist(cons
if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)
md.bestMode = bestInter;
+ if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)
+ md.bestMode = &md.pred[PRED_BIDIR];
+
if (bTryIntra && md.pred[PRED_INTRA].sa8dCost < md.bestMode->sa8dCost)
{
md.bestMode = &md.pred[PRED_INTRA];
@@ -641,6 +658,7 @@ void Analysis::compressInterCU_dist(cons
m_modeCompletionEvent.wait();
checkBestMode(md.pred[PRED_2Nx2N], depth);
+ checkBestMode(md.pred[PRED_BIDIR], depth);
if (m_param->bEnableRectInter)
{
@@ -790,8 +808,14 @@ void Analysis::compressInterCU_rd0_4(con
{
md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom);
checkInter_rd0_4(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N);
+
+ if (m_slice->m_sliceType == B_SLICE)
+ {
+ md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
+ checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);
+ }
+
Mode *bestInter = &md.pred[PRED_2Nx2N];
-
if (m_param->bEnableRectInter)
{
md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom);
@@ -853,11 +877,16 @@ void Analysis::compressInterCU_rd0_4(con
prepMotionCompensation(bestInter->cu, cuGeom, puIdx);
motionCompensation(bestInter->predYuv, false, true);
}
+ encodeResAndCalcRdInterCU(*bestInter, cuGeom);
+ checkBestMode(*bestInter, depth);
- encodeResAndCalcRdInterCU(*bestInter, cuGeom);
-
- if (!md.bestMode || bestInter->rdCost < md.bestMode->rdCost)
- md.bestMode = bestInter;
+ /* If BIDIR is available and within 17/16 of best inter option, choose by RDO */
+ if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&
+ md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17)
+ {
+ encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+ checkBestMode(md.pred[PRED_BIDIR], depth);
+ }
if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) ||
md.bestMode->sa8dCost == MAX_INT64)
@@ -865,16 +894,19 @@ void Analysis::compressInterCU_rd0_4(con
md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom);
checkIntraInInter(md.pred[PRED_INTRA], cuGeom);
encodeIntraInInter(md.pred[PRED_INTRA], cuGeom);
- if (md.pred[PRED_INTRA].rdCost < md.bestMode->rdCost)
- md.bestMode = &md.pred[PRED_INTRA];
+ checkBestMode(md.pred[PRED_INTRA], depth);
}
}
else
{
- /* SA8D choice between merge/skip, inter, and intra */
+ /* SA8D choice between merge/skip, inter, bidir, and intra */
if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)
md.bestMode = bestInter;
+ if (m_slice->m_sliceType == B_SLICE &&
+ md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)
+ md.bestMode = &md.pred[PRED_BIDIR];
+
if (bTryIntra || md.bestMode->sa8dCost == MAX_INT64)
{
md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom);
@@ -1052,9 +1084,19 @@ void Analysis::compressInterCU_rd5_6(con
checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, false);
checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
+ if (m_slice->m_sliceType == B_SLICE)
+ {
+ md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom);
+ checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);
+ if (md.pred[PRED_BIDIR].sa8dCost < MAX_INT64)
+ {
+ encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);
+ checkBestMode(md.pred[PRED_BIDIR], cuGeom.depth);
+ }
+ }
+
if (m_param->bEnableRectInter)
{
- // Nx2N rect
if (!m_param->bEnableCbfFastMode || md.bestMode->cu.getQtRootCbf(0))
{
md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom);
@@ -1407,12 +1449,17 @@ void Analysis::checkInter_rd0_4(Mode& in
if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_interAnalysisData)
{
- for (int32_t i = 0; i < numPredDir; i++)
+ for (uint32_t part = 0; part < interMode.cu.getNumPartInter(); part++)
More information about the x265-commits
mailing list