[x265-commits] [x265] improve codeCoeffNxN by inline getSigCtxInc()

Min Chen chenm003 at 163.com
Thu Mar 26 19:50:21 CET 2015


details:   http://hg.videolan.org/x265/rev/50b2e56c5bbb
branches:  
changeset: 9904:50b2e56c5bbb
user:      Min Chen <chenm003 at 163.com>
date:      Tue Mar 24 18:38:51 2015 -0700
description:
improve codeCoeffNxN by inline getSigCtxInc()
Subject: [x265] improve codeCoeffNxN by merge loop of encodeBin

details:   http://hg.videolan.org/x265/rev/f5a1388cb6d7
branches:  
changeset: 9905:f5a1388cb6d7
user:      Min Chen <chenm003 at 163.com>
date:      Tue Mar 24 18:38:55 2015 -0700
description:
improve codeCoeffNxN by merge loop of encodeBin
Subject: [x265] improve codeCoeffNxN by new fast RD path

details:   http://hg.videolan.org/x265/rev/b8ad3d0ebc5a
branches:  
changeset: 9906:b8ad3d0ebc5a
user:      Min Chen <chenm003 at 163.com>
date:      Tue Mar 24 18:38:59 2015 -0700
description:
improve codeCoeffNxN by new fast RD path
Subject: [x265] slicetype: add a check for cost estimate failure

details:   http://hg.videolan.org/x265/rev/9014ec48369e
branches:  
changeset: 9907:9014ec48369e
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 13:43:06 2015 -0500
description:
slicetype: add a check for cost estimate failure
Subject: [x265] slicetype: cleanup runtime stat-collection, fewer ifdefs within the functions

details:   http://hg.videolan.org/x265/rev/3ef5953f23d5
branches:  
changeset: 9908:3ef5953f23d5
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 13:54:54 2015 -0500
description:
slicetype: cleanup runtime stat-collection, fewer ifdefs within the functions
Subject: [x265] slicetype: use auto-vars to catch race hazards

details:   http://hg.videolan.org/x265/rev/d5228e33bf07
branches:  
changeset: 9909:d5228e33bf07
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 16:47:22 2015 -0400
description:
slicetype: use auto-vars to catch race hazards

No output change
Subject: [x265] slicetype: nit

details:   http://hg.videolan.org/x265/rev/29dd2cad1c5c
branches:  
changeset: 9910:29dd2cad1c5c
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 16:47:33 2015 -0400
description:
slicetype: nit
Subject: [x265] slicetype: disable b-adapt 2 work batching, temporarily

details:   http://hg.videolan.org/x265/rev/7105fb807224
branches:  
changeset: 9911:7105fb807224
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 16:50:12 2015 -0400
description:
slicetype: disable b-adapt 2 work batching, temporarily
Subject: [x265] check sched failures

details:   http://hg.videolan.org/x265/rev/552ad1ab31ca
branches:  
changeset: 9912:552ad1ab31ca
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 23:34:31 2015 -0400
description:
check sched failures
Subject: [x265] slicetype: re-enable batching, keep coop slices disabled

details:   http://hg.videolan.org/x265/rev/90ca0cc6a427
branches:  
changeset: 9913:90ca0cc6a427
user:      Steve Borho <steve at borho.org>
date:      Wed Mar 25 23:31:27 2015 -0500
description:
slicetype: re-enable batching, keep coop slices disabled
Subject: [x265] asm: chroma_hps[6x8] avx2 - improved 670c->602c

details:   http://hg.videolan.org/x265/rev/f49d74a6f176
branches:  
changeset: 9914:f49d74a6f176
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Thu Mar 26 09:43:49 2015 +0530
description:
asm: chroma_hps[6x8] avx2 - improved 670c->602c
Subject: [x265] asm: fix alignment issue in ssd

details:   http://hg.videolan.org/x265/rev/4d0478e547d2
branches:  
changeset: 9915:4d0478e547d2
user:      Sumalatha Polureddy
date:      Thu Mar 26 17:01:40 2015 +0530
description:
asm: fix alignment issue in ssd

replace aligned mov with unaligned mov in SSD function. for historical reasons,
we haven't forced alignment of fenc for SSD (probably something to be revisited)
Subject: [x265] api: comment nits

details:   http://hg.videolan.org/x265/rev/1974406ff5fd
branches:  
changeset: 9916:1974406ff5fd
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 10:49:41 2015 -0500
description:
api: comment nits
Subject: [x265] api: add param.lookaheadSlices

details:   http://hg.videolan.org/x265/rev/32b79736f128
branches:  
changeset: 9917:32b79736f128
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 10:56:11 2015 -0500
description:
api: add param.lookaheadSlices

This threading feature used to be auto-enabled when the thread pool was a
certain size. But this makes the encoder output variable depending on the number
of CPUs, which is not particularly helpful, especially when lookahead slices are
often not very helpful with performance.  So I'm putting the choice in the hands
of the user.
Subject: [x265] slicetype: respect new m_param->lookaheadSlices param

details:   http://hg.videolan.org/x265/rev/75e5803b6fa3
branches:  
changeset: 9918:75e5803b6fa3
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 11:12:39 2015 -0500
description:
slicetype: respect new m_param->lookaheadSlices param
Subject: [x265] param: expose 'lookahead-slices' for param.lookaheadSlices (help hidden at H1)

details:   http://hg.videolan.org/x265/rev/9178c211a0fe
branches:  
changeset: 9919:9178c211a0fe
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 11:06:57 2015 -0500
description:
param: expose 'lookahead-slices' for param.lookaheadSlices (help hidden at H1)

Most users will not want to use this, it is only helpful on fairly large
machines and even then only helpful for some use-cases.
Subject: [x265] docs: add documentation for --lookahead-slices

details:   http://hg.videolan.org/x265/rev/a5d8c3dba996
branches:  
changeset: 9920:a5d8c3dba996
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 11:18:55 2015 -0500
description:
docs: add documentation for --lookahead-slices
Subject: [x265] docs: make the --b-adapt docs less embarrasingly bad

details:   http://hg.videolan.org/x265/rev/6c9e3e8f8ec6
branches:  
changeset: 9921:6c9e3e8f8ec6
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 11:22:58 2015 -0500
description:
docs: make the --b-adapt docs less embarrasingly bad
Subject: [x265] docs: update threading details of lookahead

details:   http://hg.videolan.org/x265/rev/dc7a6c5fbda1
branches:  
changeset: 9922:dc7a6c5fbda1
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 26 11:24:23 2015 -0500
description:
docs: update threading details of lookahead

diffstat:

 doc/reST/cli.rst                     |   29 +++++-
 doc/reST/threading.rst               |    3 +-
 source/CMakeLists.txt                |    2 +-
 source/common/param.cpp              |    6 +
 source/common/x86/asm-primitives.cpp |    1 +
 source/common/x86/ipfilter8.asm      |   72 ++++++++++++++
 source/common/x86/ssd-a.asm          |    8 +-
 source/encoder/entropy.cpp           |  173 ++++++++++++++++++++++++++++++++--
 source/encoder/slicetype.cpp         |   97 +++++++++++++-----
 source/x265.h                        |   60 +++++++-----
 source/x265cli.h                     |    2 +
 11 files changed, 381 insertions(+), 72 deletions(-)

diffs (truncated from 808 to 300 lines):

diff -r 24fdb661bb57 -r dc7a6c5fbda1 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Wed Mar 25 12:49:01 2015 -0500
+++ b/doc/reST/cli.rst	Thu Mar 26 11:24:23 2015 -0500
@@ -950,11 +950,36 @@ Slice decision options
 
 	**Range of values:** Between the maximum consecutive bframe count (:option:`--bframes`) and 250
 
+.. option:: --lookahead-slices <0..16>
+
+	Use multiple worker threads to measure the estimated cost of each
+	frame within the lookahead. When :option:`--b-adapt` is 2, most
+	frame cost estimates will be performed in batch mode, many cost
+	estimates at the same time, and lookahead-slices is ignored for
+	batched estimates. The effect on performance can be quite small.
+	The higher this parameter, the less accurate the frame costs will be
+	(since context is lost across slice boundaries) which will result in
+	less accurate B-frame and scene-cut decisions.
+
+	The encoder may internally lower the number of slices to ensure
+	each slice codes at least 10 16x16 rows of lowres blocks. If slices
+	are used in lookahead, the are logged in the list of tools as
+	*lslices*.
+	
+	**Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+
 .. option:: --b-adapt <integer>
 
-	Adaptive B frame scheduling. Default 2
+	Set the level of effort in determining B frame placement.
 
-	**Values:** 0:none; 1:fast; 2:full(trellis)
+	With b-adapt 0, the GOP structure is fixed based on the values of
+	:option:`--keyint` and :option:`--bframes`.
+	
+	With b-adapt 1 a light lookahead is used to choose B frame placement.
+
+	With b-adapt 2 (trellis) a viterbi B path selection is performed
+
+	**Values:** 0:none; 1:fast; 2:full(trellis) **default**
 
 .. option:: --bframes, -b <0..16>
 
diff -r 24fdb661bb57 -r dc7a6c5fbda1 doc/reST/threading.rst
--- a/doc/reST/threading.rst	Wed Mar 25 12:49:01 2015 -0500
+++ b/doc/reST/threading.rst	Thu Mar 26 11:24:23 2015 -0500
@@ -223,7 +223,8 @@ Lookahead
 The lookahead module of x265 (the lowres pre-encode which determines
 scene cuts and slice types) uses the thread pool to distribute the
 lowres cost analysis to worker threads. It will use bonded task groups
-to perform batches of frame cost estimates.
+to perform batches of frame cost estimates, and it may optionally use
+bonded task groups to measure single frame cost estimates using slices.
 
 The function slicetypeDecide() itself is also be performed by a worker
 thread if your encoder has a thread pool, else it runs within the
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/CMakeLists.txt	Thu Mar 26 11:24:23 2015 -0500
@@ -24,7 +24,7 @@ include(CheckSymbolExists)
 include(CheckCXXCompilerFlag)
 
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 49)
+set(X265_BUILD 50)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/param.cpp
--- a/source/common/param.cpp	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/param.cpp	Thu Mar 26 11:24:23 2015 -0500
@@ -138,6 +138,7 @@ void x265_param_default(x265_param* para
     param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
     param->bBPyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
+    param->lookaheadSlices = 0;
 
     /* Intra Coding Tools */
     param->bEnableConstrainedIntra = 0;
@@ -598,6 +599,7 @@ int x265_param_parse(x265_param* p, cons
     OPT2("constrained-intra", "cip") p->bEnableConstrainedIntra = atobool(value);
     OPT("fast-intra") p->bEnableFastIntra = atobool(value);
     OPT("open-gop") p->bOpenGOP = atobool(value);
+    OPT("lookahead-slices") p->lookaheadSlices = atoi(value);
     OPT("scenecut")
     {
         p->scenecutThreshold = atobool(value);
@@ -1063,6 +1065,8 @@ int x265_check_params(x265_param* param)
           "max consecutive bframe count must be 16 or smaller");
     CHECK(param->lookaheadDepth > X265_LOOKAHEAD_MAX,
           "Lookahead depth must be less than 256");
+    CHECK(param->lookaheadSlices > 16 || param->lookaheadSlices < 0,
+          "Lookahead slices must between 0 and 16");
     CHECK(param->rc.aqMode < X265_AQ_NONE || X265_AQ_AUTO_VARIANCE < param->rc.aqMode,
           "Aq-Mode is out of range");
     CHECK(param->rc.aqStrength < 0 || param->rc.aqStrength > 3,
@@ -1304,6 +1308,7 @@ void x265_print_params(x265_param* param
     TOOLOPT(param->bIntraInBFrames, "b-intra");
     TOOLOPT(param->bEnableFastIntra, "fast-intra");
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
+    TOOLVAL(param->lookaheadSlices, "lslices=%d");
     if (param->bEnableLoopFilter)
     {
         if (param->deblockingFilterBetaOffset || param->deblockingFilterTCOffset)
@@ -1365,6 +1370,7 @@ char *x265_param2string(x265_param* p)
     s += sprintf(s, " min-keyint=%d", p->keyframeMin);
     s += sprintf(s, " scenecut=%d", p->scenecutThreshold);
     s += sprintf(s, " rc-lookahead=%d", p->lookaheadDepth);
+    s += sprintf(s, " lookahead-slices=%d", p->lookaheadSlices);
     s += sprintf(s, " bframes=%d", p->bframes);
     s += sprintf(s, " bframe-bias=%d", p->bFrameBias);
     s += sprintf(s, " b-adapt=%d", p->bFrameAdaptive);
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp	Thu Mar 26 11:24:23 2015 -0500
@@ -1782,6 +1782,7 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_hps = x265_interp_4tap_horiz_ps_32x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_hps = x265_interp_4tap_horiz_ps_2x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].filter_hps = x265_interp_4tap_horiz_ps_2x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_hps = x265_interp_4tap_horiz_ps_6x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_hpp = x265_interp_4tap_horiz_pp_24x32_avx2;
 
         p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/ipfilter8.asm	Thu Mar 26 11:24:23 2015 -0500
@@ -20484,3 +20484,75 @@ cglobal interp_4tap_horiz_pp_24x32, 4,6,
     dec               r4d
     jnz               .loop
     RET
+
+;-----------------------------------------------------------------------------------------------------------------------------
+; void interp_4tap_horiz_ps_6x8(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, int isRowExt)
+;-----------------------------------------------------------------------------------------------------------------------------;
+INIT_YMM avx2 
+cglobal interp_4tap_horiz_ps_6x8, 4,7,6
+    mov                r4d,            r4m
+    mov                r5d,            r5m
+    add                r3d,            r3d
+
+%ifdef PIC
+    lea                r6,             [tab_ChromaCoeff]
+    vpbroadcastd       m0,             [r6 + r4 * 4]
+%else
+    vpbroadcastd       m0,             [tab_ChromaCoeff + r4 * 4]
+%endif
+
+    vbroadcasti128     m2,             [pw_1]
+    vbroadcasti128     m5,             [pw_2000]
+    mova               m1,             [tab_Tm]
+
+    ; register map
+    ; m0 - interpolate coeff
+    ; m1 - shuffle order table
+    ; m2 - constant word 1
+
+    mov               r6d,             8/2
+    dec               r0
+    test              r5d,             r5d
+    jz                .loop
+    sub               r0 ,             r1
+    inc               r6d
+
+.loop
+    ; Row 0
+    vbroadcasti128    m3,              [r0]
+    pshufb            m3,              m1
+    pmaddubsw         m3,              m0
+    pmaddwd           m3,              m2
+
+    ; Row 1
+    vbroadcasti128    m4,              [r0 + r1]
+    pshufb            m4,              m1
+    pmaddubsw         m4,              m0
+    pmaddwd           m4,              m2
+    packssdw          m3,              m4
+    psubw             m3,              m5
+    vpermq            m3,              m3,          11011000b
+    vextracti128      xm4,             m3,          1
+    movq              [r2],            xm3
+    pextrd            [r2 + 8],        xm3,         2
+    movq              [r2 + r3],       xm4
+    pextrd            [r2 + r3 + 8],   xm4,         2
+    lea               r2,              [r2 + r3 * 2]
+    lea               r0,              [r0 + r1 * 2]
+    dec               r6d
+    jnz              .loop
+    test              r5d,             r5d
+    jz               .end
+
+    ;Row 11
+    vbroadcasti128    m3,              [r0]
+    pshufb            m3,              m1
+    pmaddubsw         m3,              m0
+    pmaddwd           m3,              m2
+    packssdw          m3,              m3
+    psubw             m3,              m5
+    vextracti128      xm4,             m3,          1
+    movq              [r2],            xm3
+    movd              [r2+8],          xm4
+.end
+    RET
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/ssd-a.asm
--- a/source/common/x86/ssd-a.asm	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/ssd-a.asm	Thu Mar 26 11:24:23 2015 -0500
@@ -822,10 +822,10 @@ SSD_SS_64xN
 
 %if HIGH_BIT_DEPTH == 0
 %macro SSD_LOAD_FULL 5
-    mova      m1, [t0+%1]
-    mova      m2, [t2+%2]
-    mova      m3, [t0+%3]
-    mova      m4, [t2+%4]
+    movu      m1, [t0+%1]
+    movu      m2, [t2+%2]
+    movu      m3, [t0+%3]
+    movu      m4, [t2+%4]
 %if %5==1
     add       t0, t1
     add       t2, t3
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/encoder/entropy.cpp
--- a/source/encoder/entropy.cpp	Wed Mar 25 12:49:01 2015 -0500
+++ b/source/encoder/entropy.cpp	Thu Mar 26 11:24:23 2015 -0500
@@ -1555,22 +1555,173 @@ void Entropy::codeCoeffNxN(const CUData&
         // encode significant_coeff_flag
         if (sigCoeffGroupFlag64 & cgBlkPosMask)
         {
+            X265_CHECK((log2TrSize != 2) || (log2TrSize == 2 && subSet == 0), "log2TrSize and subSet mistake!\n");
             const int patternSigCtx = Quant::calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
-            uint32_t blkPos, sig, ctxSig;
-            for (; scanPosSigOff >= 0; scanPosSigOff--)
+
+            static const uint8_t ctxIndMap4x4[16] =
             {
-                blkPos  = codingParameters.scan[subPosBase + scanPosSigOff];
-                sig     = scanFlagMask & 1;
-                scanFlagMask >>= 1;
-                X265_CHECK((uint32_t)(coeff[blkPos] != 0) == sig, "sign bit mistake\n");
-                if (scanPosSigOff != 0 || subSet == 0 || numNonZero)
+                0, 1, 4, 5,
+                2, 3, 4, 5,
+                6, 6, 8, 8,
+                7, 7, 8, 8
+            };
+            // NOTE: [patternSigCtx][posXinSubset][posYinSubset]
+            static const uint8_t table_cnt[4][4][4] =
+            {
+                // patternSigCtx = 0
                 {
-                    ctxSig = Quant::getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext);
-                    encodeBin(sig, baseCtx[ctxSig]);
+                    { 2, 1, 1, 0 },
+                    { 1, 1, 0, 0 },
+                    { 1, 0, 0, 0 },
+                    { 0, 0, 0, 0 },
+                },
+                // patternSigCtx = 1
+                {
+                    { 2, 1, 0, 0 },
+                    { 2, 1, 0, 0 },
+                    { 2, 1, 0, 0 },
+                    { 2, 1, 0, 0 },
+                },
+                // patternSigCtx = 2
+                {
+                    { 2, 2, 2, 2 },
+                    { 1, 1, 1, 1 },
+                    { 0, 0, 0, 0 },
+                    { 0, 0, 0, 0 },
+                },
+                // patternSigCtx = 3
+                {
+                    { 2, 2, 2, 2 },
+                    { 2, 2, 2, 2 },
+                    { 2, 2, 2, 2 },
+                    { 2, 2, 2, 2 },
                 }
-                absCoeff[numNonZero] = int(abs(coeff[blkPos]));
-                numNonZero += sig;
+            };
+            if (m_bitIf)
+            {
+                if (log2TrSize == 2)
+                {
+                    uint32_t blkPos, sig, ctxSig;
+                    for (; scanPosSigOff >= 0; scanPosSigOff--)
+                    {
+                        blkPos  = codingParameters.scan[subPosBase + scanPosSigOff];
+                        sig     = scanFlagMask & 1;
+                        scanFlagMask >>= 1;
+                        X265_CHECK((uint32_t)(coeff[blkPos] != 0) == sig, "sign bit mistake\n");
+                        {
+                            ctxSig = ctxIndMap4x4[blkPos];
+                            X265_CHECK(ctxSig == Quant::getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext), "sigCtx mistake!\n");;
+                            encodeBin(sig, baseCtx[ctxSig]);
+                        }
+                        absCoeff[numNonZero] = int(abs(coeff[blkPos]));
+                        numNonZero += sig;
+                    }
+                }
+                else


More information about the x265-commits mailing list