[x265-commits] [x265] docs: improve --pmode documentation, the feature is fully...

Thu Oct 30 19:15:19 CET 2014

details:   http://hg.videolan.org/x265/rev/9b73a4d2210a
branches:  stable
changeset: 8755:9b73a4d2210a
user:      Steve Borho <steve at borho.org>
date:      Wed Oct 29 23:12:27 2014 -0500
description:
docs: improve --pmode documentation, the feature is fully functional
Subject: [x265] search: move m_bestME[] from search to Mode structure

details:   http://hg.videolan.org/x265/rev/a147b3b6c2f7
branches:  
changeset: 8756:a147b3b6c2f7
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Wed Oct 29 12:54:03 2014 +0530
description:
search: move m_bestME[] from search to Mode structure
Subject: [x265] analysis: remove TODO comment, I've given up on the idea

details:   http://hg.videolan.org/x265/rev/31d648740464
branches:  
changeset: 8757:31d648740464
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 30 11:08:41 2014 -0500
description:
analysis: remove TODO comment, I've given up on the idea
Subject: [x265] api: allow --psy-rdoq values up to 50; it can be beneficial for film grain

details:   http://hg.videolan.org/x265/rev/73c243602b07
branches:  stable
changeset: 8758:73c243602b07
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 30 12:38:40 2014 -0500
description:
api: allow --psy-rdoq values up to 50; it can be beneficial for film grain
Subject: [x265] encoder: give more warnings when features are automatically disabled

details:   http://hg.videolan.org/x265/rev/ba3193adff60
branches:  stable
changeset: 8759:ba3193adff60
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 30 12:39:28 2014 -0500
description:
encoder: give more warnings when features are automatically disabled

and add comments describing why the combinations are prevented.  Some of them
are simply impossible, the option would have no affect and so it is best not to
pretend it is enabled. Some will not be useful (have a negative impact on
performance with no compression improvement). And others are just currently
broken and not typically used.
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/0f14e29eceb1
branches:  
changeset: 8760:0f14e29eceb1
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 30 13:08:50 2014 -0500
description:
Merge with stable
Subject: [x265] encoder: fix some obviously incorrect comments

details:   http://hg.videolan.org/x265/rev/de28d1b07e6f
branches:  
changeset: 8761:de28d1b07e6f
user:      Steve Borho <steve at borho.org>
date:      Thu Oct 30 13:09:57 2014 -0500
description:
encoder: fix some obviously incorrect comments

diffstat:

 doc/reST/cli.rst                |   54 +++-
 doc/reST/presets.rst            |    2 +-
 doc/reST/threading.rst          |    7 +-
 source/CMakeLists.txt           |    2 +-
 source/common/param.cpp         |   31 ++-
 source/encoder/analysis.cpp     |  360 ++++++++-----------------------
 source/encoder/analysis.h       |   28 +-
 source/encoder/encoder.cpp      |   66 ++--
 source/encoder/frameencoder.cpp |    2 +-
 source/encoder/search.cpp       |  455 ++++++++++++++++++++++++++++-----------
 source/encoder/search.h         |  134 ++++++-----
 source/test/CMakeLists.txt      |    3 -
 source/test/testpool.cpp        |  238 --------------------
 source/x265.cpp                 |   10 +-
 source/x265.h                   |   28 +-
 15 files changed, 640 insertions(+), 780 deletions(-)

diffs (truncated from 2090 to 300 lines):

diff -r 476acb7a4088 -r de28d1b07e6f doc/reST/cli.rst

--- a/doc/reST/cli.rst	Wed Oct 29 22:20:55 2014 -0500
+++ b/doc/reST/cli.rst	Thu Oct 30 13:09:57 2014 -0500
@@ -76,16 +76,18 @@ Standalone Executable Options
 	Parallel mode decision, or distributed mode analysis. When enabled
 	the encoder will distribute the analysis work of each CU (merge,
 	inter, intra) across multiple worker threads. Only recommended if
-	x265 is not already saturating the CPU cores. Currently only
-	supported in RD levels 3 and 4, and is most effective when --rect is
-	enabled. This feature is implicitly disabled when no thread pool is
-	present.
+	x265 is not already saturating the CPU cores. In RD levels 3 and 4
+	it will be most effective if --rect was enabled. At RD levels 5 and
+	6 there is generally always enough work to distribute to warrant the
+	overhead, assuming your CPUs are not already saturated.
+	
+	--pmode will increase utilization without reducing compression
+	efficiency. In fact, since the modes are all measured in parallel it
+	makes certain early-outs impractical and thus you usually get
+	slightly better compression when it is enabled (at the expense of
+	not skipping improbable modes).
 
-	--pmode will increase utilization on many core systems without
-	reducing compression efficiency. In fact, since the modes are all
-	measured in parallel it makes certain early-outs impractical and
-	thus you usually get slightly better compression when it is enabled
-	(at the expense of not skipping improbable modes).
+	This feature is implicitly disabled when no thread pool is present.
 
 	Default disabled
 
@@ -97,12 +99,13 @@ Standalone Executable Options
 	if x265 is not already saturating CPU cores. :option:`--pmode` is
 	much more effective than this option, since the amount of work it
 	distributes is substantially higher. With --pme it is not unusual
-	for the overhead of distributing the work outweighs the parallelism
-	benefits. This feature is implicitly disabled when no thread pool is
-	present.
+	for the overhead of distributing the work to outweigh the
+	parallelism benefits.
+	
+	This feature is implicitly disabled when no thread pool is present.
 
-	--pme will increase utilization on many core systems without any
-	substantial effect om compression efficiency.
+	--pme will increase utilization on many core systems with no effect
+	on the output bitstream.
 	
 	Default disabled
 
@@ -770,9 +773,10 @@ psycho-visual settings.
 	visual quality at the cost of lower quality metric scores.  It only
 	has effect on slower presets which use RDO Quantization
 	(:option:`--rd` 4, 5 and 6). 1.0 is a typical value. Default
-	disabled. Experimental
+	disabled. High values can be beneficial in preserving high-frequency
+	detail like film grain. Experimental
 
-	**Range of values:** 0 .. 10.0
+	**Range of values:** 0 .. 50.0
 
 
 Slice decision options
@@ -1040,9 +1044,23 @@ Quality, rate control and rate distortio
 Loop filters
 ============
 
-.. option:: --lft, --no-lft
+.. option:: --deblock=<int>:<int>, --no-deblock
 
-	Toggle deblocking loop filter, default enabled
+	Toggle deblocking loop filter, optionally specify deblocking
+	strength offsets.
+
+	<int>:<int> - parsed as tC offset and Beta offset
+	<int>,<int> - parsed as tC offset and Beta offset
+	<int>       - both tC and Beta offsets assigned the same value
+
+	If unspecified, the offsets default to 0. The offsets must be in a
+	range of -6 (lowest strength) to 6 (highest strength).
+
+	To disable the deblocking filter entirely, use --no-deblock or
+	--deblock=false. Default enabled, with both offsets defaulting to 0
+
+	If deblocking is disabled, or the offsets are non-zero, these
+	changes from the default configuration are signaled in the PPS.
 
 .. option:: --sao, --no-sao
 
diff -r 476acb7a4088 -r de28d1b07e6f doc/reST/presets.rst
--- a/doc/reST/presets.rst	Wed Oct 29 22:20:55 2014 -0500
+++ b/doc/reST/presets.rst	Thu Oct 30 13:09:57 2014 -0500
@@ -66,7 +66,7 @@ The presets adjust encoder parameters to
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 | rdLevel      |    2      |     2     |    2     |   2    |  2   |    3   |  4   |   6    |    6     |    6    |
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| lft          |    0      |     1     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
+| deblock      |    0      |     1     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 | tu-intra     |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
 +--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
diff -r 476acb7a4088 -r de28d1b07e6f doc/reST/threading.rst
--- a/doc/reST/threading.rst	Wed Oct 29 22:20:55 2014 -0500
+++ b/doc/reST/threading.rst	Thu Oct 30 13:09:57 2014 -0500
@@ -86,9 +86,12 @@ Parallel Mode Analysis
 ======================
 
 When :option:`--pmode` is enabled, each CU (at all depths from 64x64 to
-8x8) will distribute the analysis work to the thread pool. Each analysis
+8x8) will distribute its analysis work to the thread pool. Each analysis
 job will measure the cost of one prediction for the CU: merge, skip,
-intra, inter (2Nx2N, Nx2N, 2NxN, and AMP)
+intra, inter (2Nx2N, Nx2N, 2NxN, and AMP). At slower presets, the amount
+of increased parallelism is often enough to be able to reduce frame
+parallelism while achieving the same overall CPU utilization. Reducing
+frame threads is often beneficial to ABR and VBV rate control.
 
 Parallel Motion Estimation
 ==========================
diff -r 476acb7a4088 -r de28d1b07e6f source/CMakeLists.txt
--- a/source/CMakeLists.txt	Wed Oct 29 22:20:55 2014 -0500
+++ b/source/CMakeLists.txt	Thu Oct 30 13:09:57 2014 -0500
@@ -21,7 +21,7 @@ include(CheckSymbolExists)
 include(CheckCXXCompilerFlag)
 
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 35)
+set(X265_BUILD 36)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r 476acb7a4088 -r de28d1b07e6f source/common/param.cpp
--- a/source/common/param.cpp	Wed Oct 29 22:20:55 2014 -0500
+++ b/source/common/param.cpp	Thu Oct 30 13:09:57 2014 -0500
@@ -623,7 +623,22 @@ int x265_param_parse(x265_param *p, cons
     OPT("psy-rdoq") p->psyRdoq = atof(value);
     OPT("signhide") p->bEnableSignHiding = atobool(value);
     OPT("b-intra") p->bIntraInBFrames = atobool(value);
-    OPT("lft") p->bEnableLoopFilter = atobool(value);
+    OPT("lft") p->bEnableLoopFilter = atobool(value); /* DEPRECATED */
+    OPT("deblock")
+    {
+        if (2 == sscanf(value, "%d:%d", &p->deblockingFilterTCOffset, &p->deblockingFilterBetaOffset) ||
+            2 == sscanf(value, "%d,%d", &p->deblockingFilterTCOffset, &p->deblockingFilterBetaOffset))
+        {
+            p->bEnableLoopFilter = true;
+        }
+        else if (sscanf(value, "%d", &p->deblockingFilterTCOffset))
+        {
+            p->bEnableLoopFilter = 1;
+            p->deblockingFilterBetaOffset = p->deblockingFilterTCOffset;
+        }
+        else
+            p->bEnableLoopFilter = atobool(value);
+    }
     OPT("sao") p->bEnableSAO = atobool(value);
     OPT("sao-non-deblock") p->bSaoNonDeblocked = atobool(value);
     OPT("ssim") p->bEnableSsim = atobool(value);
@@ -960,8 +975,12 @@ int x265_check_params(x265_param *param)
           "Aq-Mode is out of range");
     CHECK(param->rc.aqStrength < 0 || param->rc.aqStrength > 3,
           "Aq-Strength is out of range");
+    CHECK(param->deblockingFilterTCOffset < -6 || param->deblockingFilterTCOffset > 6,
+          "deblocking filter tC offset must be in the range of -6 to +6");
+    CHECK(param->deblockingFilterBetaOffset < -6 || param->deblockingFilterBetaOffset > 6,
+          "deblocking filter Beta offset must be in the range of -6 to +6");
     CHECK(param->psyRd < 0 || 2.0 < param->psyRd, "Psy-rd strength must be between 0 and 2.0");
-    CHECK(param->psyRdoq < 0 || 10.0 < param->psyRdoq, "Psy-rdoq strength must be between 0 and 10.0");
+    CHECK(param->psyRdoq < 0 || 50.0 < param->psyRdoq, "Psy-rdoq strength must be between 0 and 50.0");
     CHECK(param->bEnableWavefront < 0, "WaveFrontSynchro cannot be negative");
     CHECK(!param->bEnableWavefront && param->rc.vbvBufferSize, "VBV requires wave-front parallelism (--wpp)");
     CHECK((param->vui.aspectRatioIdc < 0
@@ -1156,7 +1175,13 @@ void x265_print_params(x265_param *param
     TOOLOPT(param->bEnableCbfFastMode, "cfm");
     if (param->noiseReduction)
         fprintf(stderr, "nr=%d ", param->noiseReduction);
-    TOOLOPT(param->bEnableLoopFilter, "lft");
+    if (param->bEnableLoopFilter)
+    {
+        if (param->deblockingFilterBetaOffset || param->deblockingFilterTCOffset)
+            fprintf(stderr, "deblock(tC=%d:B=%d) ", param->deblockingFilterTCOffset, param->deblockingFilterBetaOffset);
+        else
+            TOOLOPT(param->bEnableLoopFilter, "deblock");
+    }
     if (param->bEnableSAO)
         fprintf(stderr, "sao%s ", param->bSaoNonDeblocked ? "-non-deblock" : "");
     TOOLOPT(param->bEnableSignHiding, "signhide");
diff -r 476acb7a4088 -r de28d1b07e6f source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp	Wed Oct 29 22:20:55 2014 -0500
+++ b/source/encoder/analysis.cpp	Thu Oct 30 13:09:57 2014 -0500
@@ -116,7 +116,7 @@ void Analysis::destroy()
     }
 }
 
-Search::Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, const Entropy& initialContext)
+Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, const Entropy& initialContext)
 {
     m_slice = ctu.m_slice;
     m_frame = &frame;
@@ -155,7 +155,7 @@ Search::Mode& Analysis::compressCTU(CUDa
              * they are available for intra predictions */
             m_modeDepth[0].fencYuv.copyToPicYuv(*m_frame->m_reconPicYuv, ctu.m_cuAddr, 0);
             
-            compressInterCU_rd0_4(ctu, cuGeom); // TODO: this really wants to be compressInterCU_rd0_1
+            compressInterCU_rd0_4(ctu, cuGeom);
 
             /* generate residual for entire CTU at once and copy to reconPic */
             encodeResidue(ctu, cuGeom);
@@ -350,17 +350,17 @@ void Analysis::parallelME(int threadId, 
         slave->m_frame = m_frame;
 
         PicYuv* fencPic = m_frame->m_origPicYuv;
-        pixel* pu = fencPic->getLumaAddr(m_curMECu->m_cuAddr, m_curGeom->encodeIdx + m_puAbsPartIdx);
+        pixel* pu = fencPic->getLumaAddr(m_curInterMode->cu.m_cuAddr, m_curGeom->encodeIdx + m_puAbsPartIdx);
         slave->m_me.setSourcePlane(fencPic->m_picOrg[0], fencPic->m_stride);
         slave->m_me.setSourcePU(pu - fencPic->m_picOrg[0], m_puWidth, m_puHeight);
 
-        slave->prepMotionCompensation(*m_curMECu, *m_curGeom, m_curPart);
+        slave->prepMotionCompensation(m_curInterMode->cu, *m_curGeom, m_curPart);
     }
 
     if (meId < m_slice->m_numRefIdx[0])
-        slave->singleMotionEstimation(*this, *m_curMECu, *m_curGeom, m_curPart, 0, meId);
+        slave->singleMotionEstimation(*this, *m_curInterMode, *m_curGeom, m_curPart, 0, meId);
     else
-        slave->singleMotionEstimation(*this, *m_curMECu, *m_curGeom, m_curPart, 1, meId - m_slice->m_numRefIdx[0]);
+        slave->singleMotionEstimation(*this, *m_curInterMode, *m_curGeom, m_curPart, 1, meId - m_slice->m_numRefIdx[0]);
 }
 
 void Analysis::parallelModeAnalysis(int threadId, int jobId)
@@ -389,7 +389,7 @@ void Analysis::parallelModeAnalysis(int 
         case 0:
             if (slave != this)
                 slave->m_rqt[m_curGeom->depth].cur.load(m_rqt[m_curGeom->depth].cur);
-            slave->checkIntraInInter_rd0_4(md.pred[PRED_INTRA], *m_curGeom);
+            slave->checkIntraInInter(md.pred[PRED_INTRA], *m_curGeom);
             if (m_param->rdLevel > 2)
                 slave->encodeIntraInInter(md.pred[PRED_INTRA], *m_curGeom);
             break;
@@ -479,6 +479,8 @@ void Analysis::parallelModeAnalysis(int 
     }
 }
 
+#define MATCH_NON_PMODE 0
+
 void Analysis::compressInterCU_dist(const CUData& parentCTU, const CUGeom& cuGeom)
 {
     uint32_t depth = cuGeom.depth;
@@ -560,13 +562,31 @@ void Analysis::compressInterCU_dist(cons
 
             if (bTryAmp)
             {
-                if (md.pred[PRED_2NxnU].sa8dCost < bestInter->sa8dCost)
+#if MATCH_NON_PMODE
+                bool bHor = false, bVer = false;
+                if (bestInter->cu.m_partSize[0] == SIZE_2NxN)
+                    bHor = true;
+                else if (bestInter->cu.m_partSize[0] == SIZE_Nx2N)
+                    bVer = true;
+                else if (bestInter->cu.m_partSize[0] == SIZE_2Nx2N &&
+                         md.bestMode && md.bestMode->cu.getQtRootCbf(0))
+                {
+                    bHor = true;
+                    bVer = true;
+                }
+#define HOR && bHor
+#define VER && bVer
+#else
+#define HOR
+#define VER
+#endif
+                if (md.pred[PRED_2NxnU].sa8dCost < bestInter->sa8dCost HOR)
                     bestInter = &md.pred[PRED_2NxnU];
-                if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost)
+                if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost HOR)
                     bestInter = &md.pred[PRED_2NxnD];
-                if (md.pred[PRED_nLx2N].sa8dCost < bestInter->sa8dCost)
+                if (md.pred[PRED_nLx2N].sa8dCost < bestInter->sa8dCost VER)
                     bestInter = &md.pred[PRED_nLx2N];
-                if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost)
+                if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost VER)
                     bestInter = &md.pred[PRED_nRx2N];
             }
 
@@ -583,7 +603,11 @@ void Analysis::compressInterCU_dist(cons
                 /* RD selection between merge, inter and intra */
                 checkBestMode(*bestInter, depth);
 
+#if MATCH_NON_PMODE
+                if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) || md.bestMode->sa8dCost == MAX_INT64)
+#else
                 if (bTryIntra)
+#endif
                     checkBestMode(md.pred[PRED_INTRA], depth);
             }
             else /* m_param->rdLevel == 2 */
@@ -623,10 +647,26 @@ void Analysis::compressInterCU_dist(cons
 
             if (bTryAmp)
             {