[x265-commits] [x265] param: remove --aq-mode 1 from --tune grain

Thu Mar 5 18:55:07 CET 2015

details:   http://hg.videolan.org/x265/rev/9f12387b9cd8
branches:  
changeset: 9621:9f12387b9cd8
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 27 15:00:54 2015 -0600
description:
param: remove --aq-mode 1 from --tune grain

--aq-mode 1 is now our default, so setting it to 1 here can only over-ride a
user's explicit request for aq-mode 0 or 2
Subject: [x265] intra: pull the simple 1:2:1 pixel filtering into a performance primitive

details:   http://hg.videolan.org/x265/rev/cc5b9d4abddb
branches:  
changeset: 9622:cc5b9d4abddb
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 04 09:34:19 2015 -0600
description:
intra: pull the simple 1:2:1 pixel filtering into a performance primitive

Only C-refs at this point, but at least it is templated so the compiler can
optimize and unroll loops cleanly.

This commit also changes the lookahead to use the full available reference
samples. Since we're using padded downscaled source images, the top-left and
bottom right reference samples are always available. There is no good reason
not to use them.
Subject: [x265] predict: move intra functions together in predict.cpp (no change)

details:   http://hg.videolan.org/x265/rev/fe98ea2aac5b
branches:  
changeset: 9623:fe98ea2aac5b
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 27 13:27:40 2015 -0600
description:
predict: move intra functions together in predict.cpp (no change)
Subject: [x265] predict: move 4:4:4 chroma sample filtering into initAdiPatternChroma()

details:   http://hg.videolan.org/x265/rev/39992a904cdd
branches:  
changeset: 9624:39992a904cdd
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 27 13:33:27 2015 -0600
description:
predict: move 4:4:4 chroma sample filtering into initAdiPatternChroma()
Subject: [x265] predict: don't pass a Predict member variable to a Predict method

details:   http://hg.videolan.org/x265/rev/45a5fbe7b549
branches:  
changeset: 9625:45a5fbe7b549
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 27 13:37:21 2015 -0600
description:
predict: don't pass a Predict member variable to a Predict method
Subject: [x265] api: make RDOQ level externally configurable, make two levels visible

details:   http://hg.videolan.org/x265/rev/79d9a1489616
branches:  
changeset: 9626:79d9a1489616
user:      Steve Borho <steve at borho.org>
date:      Thu Feb 26 16:37:50 2015 -0600
description:
api: make RDOQ level externally configurable, make two levels visible

This commit doesn't change any preset defaults, but it does allow much more
flexibility - a user can chose to enable rdoq and psy-rdoq at fast presets if
they so chose. And in noisy situations a user may decide to lower rdoq level to
1 to reduce decimation (psy-rdoq is a blunt hammer for this purpose)

This commit does change --tune grain to use --rdoqLevel 1 so --tune grain will
be effective at all RD levels and avoids the coding group decimation logic
within RDOQ
Subject: [x265] sao.cpp: init additional pixels for SAO

details:   http://hg.videolan.org/x265/rev/e6b519dfbf81
branches:  
changeset: 9627:e6b519dfbf81
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Mar 05 16:06:04 2015 +0530
description:
sao.cpp: init additional pixels for SAO

Prevents uninit read warnings from valgrind

diffstat:

 doc/reST/cli.rst             |   33 ++++++++++++--
 doc/reST/presets.rst         |    9 +--
 source/CMakeLists.txt        |    2 +-
 source/common/intrapred.cpp  |   28 ++++++++++++
 source/common/param.cpp      |   21 ++++++++-
 source/common/predict.cpp    |  100 ++++++++++++++----------------------------
 source/common/predict.h      |    8 +-
 source/common/primitives.h   |    2 +
 source/common/quant.cpp      |   12 ++--
 source/common/quant.h        |    4 +-
 source/encoder/encoder.cpp   |    6 +-
 source/encoder/sao.cpp       |    2 +
 source/encoder/search.cpp    |   14 +++---
 source/encoder/slicetype.cpp |   38 +++++-----------
 source/x265.h                |    9 +++
 source/x265cli.h             |    7 ++-
 16 files changed, 164 insertions(+), 131 deletions(-)

diffs (truncated from 710 to 300 lines):

diff -r ea9bdb10353f -r e6b519dfbf81 doc/reST/cli.rst

--- a/doc/reST/cli.rst	Wed Mar 04 13:20:55 2015 +0530
+++ b/doc/reST/cli.rst	Thu Mar 05 16:06:04 2015 +0530
@@ -613,6 +613,30 @@ not match.
 Options which affect the transform unit quad-tree, sometimes referred to
 as the residual quad-tree (RQT).
 
+.. option:: --rdoq-level <0|1|2>, --no-rdoq-level
+
+	Specify the amount of rate-distortion analysis to use within
+	quantization::
+
+	At level 0 rate-distortion cost is not considered in quant
+	
+	At level 1 rate-distortion cost is used to find optimal rounding
+	values for each level (and allows psy-rdoq to be effective). It
+	trades-off the signaling cost of the coefficient vs its post-inverse
+	quant distortion from the pre-quant coefficient. When
+	:option:`--psy-rdoq` is enabled, this formula is biased in favor of
+	more energy in the residual (larger coefficient absolute levels)
+	
+	At level 2 rate-distortion cost is used to make decimate decisions
+	on each 4x4 coding group, including the cost of signaling the group
+	within the group bitmap. If the total distortion of not signaling
+	the entire coding group is less than the rate cost, the block is
+	decimated. Next, it applies rate-distortion cost analysis to the
+	last non-zero coefficient, which can result in many (or all) of the
+	coding groups being decimated. Psy-rdoq is less effective at
+	preserving energy when RDOQ is at level 2, since it only has
+	influence over the level distortion costs.
+
 .. option:: --tu-intra-depth <1..4>
 
 	The transform unit (residual) quad-tree begins with the same depth
@@ -829,8 +853,8 @@ of blurred prediction modes, like DC and
 inter prediction.
 
 :option:`--psy-rdoq` will adjust the distortion cost used in
-rate-distortion optimized quantization (RDO quant), enabled in
-:option:`--rd` 4 and above, favoring the preservation of energy in the
+rate-distortion optimized quantization (RDO quant), enabled by
+:option:`--rdoq-level` 1 or 2, favoring the preservation of energy in the
 reconstructed image.  :option:`--psy-rdoq` prevents RDOQ from blurring
 all of the encoding options which psy-rd has to chose from.  At low
 strength levels, psy-rdoq will influence the quantization level
@@ -878,9 +902,8 @@ areas of high motion.
 	Influence rate distortion optimized quantization by favoring higher
 	energy in the reconstructed image. This generally improves perceived
 	visual quality at the cost of lower quality metric scores.  It only
-	has effect on slower presets which use RDO Quantization
-	(:option:`--rd` 4, 5 and 6). 1.0 is a typical value. High values can 
-	be beneficial in preserving high-frequency detail like film grain. 
+	has effect when :option:`--rdoq-level` is 1 or 2. High values can
+	be beneficial in preserving high-frequency detail like film grain.
 	Default: 1.0
 
 	**Range of values:** 0 .. 50.0
diff -r ea9bdb10353f -r e6b519dfbf81 doc/reST/presets.rst
--- a/doc/reST/presets.rst	Wed Mar 04 13:20:55 2015 +0530
+++ b/doc/reST/presets.rst	Thu Mar 05 16:06:04 2015 +0530
@@ -66,6 +66,8 @@ The presets adjust encoder parameters to
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 | rdLevel      |    2      |     2     |    2     |   2    |  2   |    3   |  4   |   6    |    6     |    6    |
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdoq-level   |    0      |     0     |    0     |   0    |  0   |    0   |  2   |   2    |    2     |    2    |
++--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 | tu-intra     |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 | tu-inter     |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
@@ -114,17 +116,12 @@ the reconstructed output. It helps rate 
 modes which preserve high frequency noise:
 
     * :option:`--psy-rd` 0.5
+    * :option:`--rdoq-level` 1
     * :option:`--psy-rdoq` 30
 
-.. Note::
-
-    --psy-rdoq is only effective when RDOQuant is enabled, which is at
-    RD levels 4, 5, and 6 (presets slow and below).
-
 It lowers the strength of adaptive quantization, so residual energy can
 be more evenly distributed across the (noisy) picture:
 
-    * :option:`--aq-mode` 1
     * :option:`--aq-strength` 0.3
 
 And it similarly tunes rate control to prevent the slice QP from
diff -r ea9bdb10353f -r e6b519dfbf81 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Wed Mar 04 13:20:55 2015 +0530
+++ b/source/CMakeLists.txt	Thu Mar 05 16:06:04 2015 +0530
@@ -21,7 +21,7 @@ include(CheckSymbolExists)
 include(CheckCXXCompilerFlag)
 
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 47)
+set(X265_BUILD 48)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r ea9bdb10353f -r e6b519dfbf81 source/common/intrapred.cpp
--- a/source/common/intrapred.cpp	Wed Mar 04 13:20:55 2015 +0530
+++ b/source/common/intrapred.cpp	Thu Mar 05 16:06:04 2015 +0530
@@ -27,6 +27,29 @@
 using namespace x265;
 
 namespace {
+
+template<int tuSize>
+void intraFilter(const pixel* samples, pixel* filtered) /* 1:2:1 filtering of left and top reference samples */
+{
+    const int tuSize2 = tuSize << 1;
+
+    pixel topLeft = samples[0], topLast = samples[tuSize2], leftLast = samples[tuSize2 + tuSize2];
+
+    // filtering top
+    for (int i = 1; i < tuSize2; i++)
+        filtered[i] = ((samples[i] << 1) + samples[i - 1] + samples[i + 1] + 2) >> 2;
+    filtered[tuSize2] = topLast;
+    
+    // filtering top-left
+    filtered[0] = ((topLeft << 1) + samples[1] + samples[tuSize2 + 1] + 2) >> 2;
+
+    // filtering left
+    filtered[tuSize2 + 1] = ((samples[tuSize2 + 1] << 1) + topLeft + samples[tuSize2 + 2] + 2) >> 2;
+    for (int i = tuSize2 + 2; i < tuSize2 + tuSize2; i++)
+        filtered[i] = ((samples[i] << 1) + samples[i - 1] + samples[i + 1] + 2) >> 2;
+    filtered[tuSize2 + tuSize2] = leftLast;
+}
+
 void dcPredFilter(const pixel* above, const pixel* left, pixel* dst, intptr_t dststride, int size)
 {
     // boundary pixels processing
@@ -216,6 +239,11 @@ namespace x265 {
 
 void setupIntraPrimitives_c(EncoderPrimitives& p)
 {
+    p.cu[BLOCK_4x4].intra_filter = intraFilter<4>;
+    p.cu[BLOCK_8x8].intra_filter = intraFilter<8>;
+    p.cu[BLOCK_16x16].intra_filter = intraFilter<16>;
+    p.cu[BLOCK_32x32].intra_filter = intraFilter<32>;
+
     p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = planar_pred_c<2>;
     p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = planar_pred_c<3>;
     p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = planar_pred_c<4>;
diff -r ea9bdb10353f -r e6b519dfbf81 source/common/param.cpp
--- a/source/common/param.cpp	Wed Mar 04 13:20:55 2015 +0530
+++ b/source/common/param.cpp	Thu Mar 05 16:06:04 2015 +0530
@@ -320,6 +320,7 @@ int x265_param_default_preset(x265_param
             param->bEnableRectInter = 1;
             param->lookaheadDepth = 25;
             param->rdLevel = 4;
+            param->rdoqLevel = 2;
             param->subpelRefine = 3;
             param->maxNumMergeCand = 3;
             param->searchMethod = X265_STAR_SEARCH;
@@ -334,6 +335,7 @@ int x265_param_default_preset(x265_param
             param->tuQTMaxInterDepth = 2;
             param->tuQTMaxIntraDepth = 2;
             param->rdLevel = 6;
+            param->rdoqLevel = 2;
             param->subpelRefine = 3;
             param->maxNumMergeCand = 3;
             param->searchMethod = X265_STAR_SEARCH;
@@ -349,6 +351,7 @@ int x265_param_default_preset(x265_param
             param->tuQTMaxInterDepth = 3;
             param->tuQTMaxIntraDepth = 3;
             param->rdLevel = 6;
+            param->rdoqLevel = 2;
             param->subpelRefine = 4;
             param->maxNumMergeCand = 4;
             param->searchMethod = X265_STAR_SEARCH;
@@ -366,6 +369,7 @@ int x265_param_default_preset(x265_param
             param->tuQTMaxInterDepth = 4;
             param->tuQTMaxIntraDepth = 4;
             param->rdLevel = 6;
+            param->rdoqLevel = 2;
             param->subpelRefine = 5;
             param->maxNumMergeCand = 5;
             param->searchMethod = X265_STAR_SEARCH;
@@ -416,11 +420,11 @@ int x265_param_default_preset(x265_param
             param->deblockingFilterBetaOffset = -2;
             param->deblockingFilterTCOffset = -2;
             param->bIntraInBFrames = 0;
+            param->rdoqLevel = 1;
             param->psyRdoq = 30;
             param->psyRd = 0.5;
             param->rc.ipFactor = 1.1;
             param->rc.pbFactor = 1.1;
-            param->rc.aqMode = X265_AQ_VARIANCE;
             param->rc.aqStrength = 0.3;
             param->rc.qCompress = 0.8;
         }
@@ -629,6 +633,17 @@ int x265_param_parse(x265_param *p, cons
     OPT("cbqpoffs") p->cbQpOffset = atoi(value);
     OPT("crqpoffs") p->crQpOffset = atoi(value);
     OPT("rd") p->rdLevel = atoi(value);
+    OPT2("rdoq", "rdoq-level")
+    {
+        int bval = atobool(value);
+        if (bError || bval)
+        {
+            bError = false;
+            p->rdoqLevel = atoi(value);
+        }
+        else
+            p->rdoqLevel = 0;
+    }
     OPT("psy-rd")
     {
         int bval = atobool(value);
@@ -1034,6 +1049,8 @@ int x265_check_params(x265_param *param)
           "Rate control mode is out of range");
     CHECK(param->rdLevel < 0 || param->rdLevel > 6,
           "RD Level is out of range");
+    CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
+        "RDOQ Level is out of range");
     CHECK(param->bframes > param->lookaheadDepth && !param->rc.bStatRead,
           "Lookahead depth must be greater than the max consecutive bframe count");
     CHECK(param->bframes < 0,
@@ -1251,7 +1268,7 @@ void x265_print_params(x265_param *param
 #define TOOLOPT(FLAG, STR) if (FLAG) fprintf(stderr, "%s ", STR)
     TOOLOPT(param->bEnableRectInter, "rect");
     TOOLOPT(param->bEnableAMP, "amp");
-    fprintf(stderr, "rd=%d ", param->rdLevel);
+    fprintf(stderr, "rd=%d rdoq=%d ", param->rdLevel, param->rdoqLevel);
     if (param->psyRd > 0.)
         fprintf(stderr, "psy-rd=%.2lf ", param->psyRd);
     if (param->psyRdoq > 0.)
diff -r ea9bdb10353f -r e6b519dfbf81 source/common/predict.cpp
--- a/source/common/predict.cpp	Wed Mar 04 13:20:55 2015 +0530
+++ b/source/common/predict.cpp	Thu Mar 05 16:06:04 2015 +0530
@@ -79,52 +79,6 @@ fail:
     return false;
 }
 
-void Predict::predIntraLumaAng(uint32_t dirMode, pixel* dst, intptr_t stride, uint32_t log2TrSize)
-{
-    int sizeIdx = log2TrSize - 2;
-    int tuSize = 1 << log2TrSize;
-    int filter = !!(g_intraFilterFlags[dirMode] & tuSize);
-    X265_CHECK(sizeIdx >= 0 && sizeIdx < 4, "intra block size is out of range\n");
-
-    bool bFilter = log2TrSize <= 4;
-    primitives.cu[sizeIdx].intra_pred[dirMode](dst, stride, intraNeighbourBuf[filter], dirMode, bFilter);
-}
-
-void Predict::predIntraChromaAng(uint32_t dirMode, pixel* dst, intptr_t stride, uint32_t log2TrSizeC, int chFmt)
-{
-    int tuSize = 1 << log2TrSizeC;
-    int tuSize2 = tuSize << 1;
-
-    pixel* srcBuf = intraNeighbourBuf[0];
-
-    if (chFmt == X265_CSP_I444 && (g_intraFilterFlags[dirMode] & tuSize))
-    {
-        pixel* fltBuf = intraNeighbourBuf[1];
-        pixel topLeft = srcBuf[0], topLast = srcBuf[tuSize2], leftLast = srcBuf[tuSize2 + tuSize2];
-
-        // filtering top
-        for (int i = 1; i < tuSize2; i++)
-            fltBuf[i] = ((srcBuf[i] << 1) + srcBuf[i - 1] + srcBuf[i + 1] + 2) >> 2;
-        fltBuf[tuSize2] = topLast;
-
-        // filtering top-left
-        fltBuf[0] = ((srcBuf[0] << 1) + srcBuf[1] + srcBuf[tuSize2 + 1] + 2) >> 2;
-
-        // filtering left
-        fltBuf[tuSize2 + 1] = ((srcBuf[tuSize2 + 1] << 1) + topLeft + srcBuf[tuSize2 + 2] + 2) >> 2;
-        for (int i = tuSize2 + 2; i < tuSize2 + tuSize2; i++)
-            fltBuf[i] = ((srcBuf[i] << 1) + srcBuf[i - 1] + srcBuf[i + 1] + 2) >> 2;
-        fltBuf[tuSize2 + tuSize2] = leftLast;
-
-        srcBuf = intraNeighbourBuf[1];
-    }
-
-    int sizeIdx = log2TrSizeC - 2;
-    X265_CHECK(sizeIdx >= 0 && sizeIdx < 4, "intra block size is out of range\n");
-    primitives.cu[sizeIdx].intra_pred[dirMode](dst, stride, srcBuf, dirMode, 0);
-}
-
-
 void Predict::motionCompensation(const CUData& cu, const PredictionUnit& pu, Yuv& predYuv, bool bLuma, bool bChroma)
 {
     int refIdx0 = cu.m_refIdx[0][pu.puAbsPartIdx];
@@ -626,12 +580,33 @@ void Predict::addWeightUni(const Predict
     }
 }
 
-void Predict::initAdiPattern(const CUData& cu, const CUGeom& cuGeom, uint32_t absPartIdx, const IntraNeighbors& intraNeighbors, int dirMode)
+void Predict::predIntraLumaAng(uint32_t dirMode, pixel* dst, intptr_t stride, uint32_t log2TrSize)
 {
-    int tuSize = intraNeighbors.tuSize;
+    int tuSize = 1 << log2TrSize;
+    int sizeIdx = log2TrSize - 2;
+    X265_CHECK(sizeIdx >= 0 && sizeIdx < 4, "intra block size is out of range\n");
+
+    int filter = !!(g_intraFilterFlags[dirMode] & tuSize);
+    bool bFilter = log2TrSize <= 4;
+    primitives.cu[sizeIdx].intra_pred[dirMode](dst, stride, intraNeighbourBuf[filter], dirMode, bFilter);