[x265-commits] [x265] improve codeCoeffNxN by inline getSigCtxInc()
Min Chen
chenm003 at 163.com
Thu Mar 26 19:50:21 CET 2015
details: http://hg.videolan.org/x265/rev/50b2e56c5bbb
branches:
changeset: 9904:50b2e56c5bbb
user: Min Chen <chenm003 at 163.com>
date: Tue Mar 24 18:38:51 2015 -0700
description:
improve codeCoeffNxN by inline getSigCtxInc()
Subject: [x265] improve codeCoeffNxN by merge loop of encodeBin
details: http://hg.videolan.org/x265/rev/f5a1388cb6d7
branches:
changeset: 9905:f5a1388cb6d7
user: Min Chen <chenm003 at 163.com>
date: Tue Mar 24 18:38:55 2015 -0700
description:
improve codeCoeffNxN by merge loop of encodeBin
Subject: [x265] improve codeCoeffNxN by new fast RD path
details: http://hg.videolan.org/x265/rev/b8ad3d0ebc5a
branches:
changeset: 9906:b8ad3d0ebc5a
user: Min Chen <chenm003 at 163.com>
date: Tue Mar 24 18:38:59 2015 -0700
description:
improve codeCoeffNxN by new fast RD path
Subject: [x265] slicetype: add a check for cost estimate failure
details: http://hg.videolan.org/x265/rev/9014ec48369e
branches:
changeset: 9907:9014ec48369e
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 13:43:06 2015 -0500
description:
slicetype: add a check for cost estimate failure
Subject: [x265] slicetype: cleanup runtime stat-collection, fewer ifdefs within the functions
details: http://hg.videolan.org/x265/rev/3ef5953f23d5
branches:
changeset: 9908:3ef5953f23d5
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 13:54:54 2015 -0500
description:
slicetype: cleanup runtime stat-collection, fewer ifdefs within the functions
Subject: [x265] slicetype: use auto-vars to catch race hazards
details: http://hg.videolan.org/x265/rev/d5228e33bf07
branches:
changeset: 9909:d5228e33bf07
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 16:47:22 2015 -0400
description:
slicetype: use auto-vars to catch race hazards
No output change
Subject: [x265] slicetype: nit
details: http://hg.videolan.org/x265/rev/29dd2cad1c5c
branches:
changeset: 9910:29dd2cad1c5c
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 16:47:33 2015 -0400
description:
slicetype: nit
Subject: [x265] slicetype: disable b-adapt 2 work batching, temporarily
details: http://hg.videolan.org/x265/rev/7105fb807224
branches:
changeset: 9911:7105fb807224
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 16:50:12 2015 -0400
description:
slicetype: disable b-adapt 2 work batching, temporarily
Subject: [x265] check sched failures
details: http://hg.videolan.org/x265/rev/552ad1ab31ca
branches:
changeset: 9912:552ad1ab31ca
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 23:34:31 2015 -0400
description:
check sched failures
Subject: [x265] slicetype: re-enable batching, keep coop slices disabled
details: http://hg.videolan.org/x265/rev/90ca0cc6a427
branches:
changeset: 9913:90ca0cc6a427
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 23:31:27 2015 -0500
description:
slicetype: re-enable batching, keep coop slices disabled
Subject: [x265] asm: chroma_hps[6x8] avx2 - improved 670c->602c
details: http://hg.videolan.org/x265/rev/f49d74a6f176
branches:
changeset: 9914:f49d74a6f176
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Thu Mar 26 09:43:49 2015 +0530
description:
asm: chroma_hps[6x8] avx2 - improved 670c->602c
Subject: [x265] asm: fix alignment issue in ssd
details: http://hg.videolan.org/x265/rev/4d0478e547d2
branches:
changeset: 9915:4d0478e547d2
user: Sumalatha Polureddy
date: Thu Mar 26 17:01:40 2015 +0530
description:
asm: fix alignment issue in ssd
replace aligned mov with unaligned mov in SSD function. for historical reasons,
we haven't forced alignment of fenc for SSD (probably something to be revisited)
Subject: [x265] api: comment nits
details: http://hg.videolan.org/x265/rev/1974406ff5fd
branches:
changeset: 9916:1974406ff5fd
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 10:49:41 2015 -0500
description:
api: comment nits
Subject: [x265] api: add param.lookaheadSlices
details: http://hg.videolan.org/x265/rev/32b79736f128
branches:
changeset: 9917:32b79736f128
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 10:56:11 2015 -0500
description:
api: add param.lookaheadSlices
This threading feature used to be auto-enabled when the thread pool was a
certain size. But this makes the encoder output variable depending on the number
of CPUs, which is not particularly helpful, especially when lookahead slices are
often not very helpful with performance. So I'm putting the choice in the hands
of the user.
Subject: [x265] slicetype: respect new m_param->lookaheadSlices param
details: http://hg.videolan.org/x265/rev/75e5803b6fa3
branches:
changeset: 9918:75e5803b6fa3
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 11:12:39 2015 -0500
description:
slicetype: respect new m_param->lookaheadSlices param
Subject: [x265] param: expose 'lookahead-slices' for param.lookaheadSlices (help hidden at H1)
details: http://hg.videolan.org/x265/rev/9178c211a0fe
branches:
changeset: 9919:9178c211a0fe
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 11:06:57 2015 -0500
description:
param: expose 'lookahead-slices' for param.lookaheadSlices (help hidden at H1)
Most users will not want to use this, it is only helpful on fairly large
machines and even then only helpful for some use-cases.
Subject: [x265] docs: add documentation for --lookahead-slices
details: http://hg.videolan.org/x265/rev/a5d8c3dba996
branches:
changeset: 9920:a5d8c3dba996
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 11:18:55 2015 -0500
description:
docs: add documentation for --lookahead-slices
Subject: [x265] docs: make the --b-adapt docs less embarrasingly bad
details: http://hg.videolan.org/x265/rev/6c9e3e8f8ec6
branches:
changeset: 9921:6c9e3e8f8ec6
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 11:22:58 2015 -0500
description:
docs: make the --b-adapt docs less embarrasingly bad
Subject: [x265] docs: update threading details of lookahead
details: http://hg.videolan.org/x265/rev/dc7a6c5fbda1
branches:
changeset: 9922:dc7a6c5fbda1
user: Steve Borho <steve at borho.org>
date: Thu Mar 26 11:24:23 2015 -0500
description:
docs: update threading details of lookahead
diffstat:
doc/reST/cli.rst | 29 +++++-
doc/reST/threading.rst | 3 +-
source/CMakeLists.txt | 2 +-
source/common/param.cpp | 6 +
source/common/x86/asm-primitives.cpp | 1 +
source/common/x86/ipfilter8.asm | 72 ++++++++++++++
source/common/x86/ssd-a.asm | 8 +-
source/encoder/entropy.cpp | 173 ++++++++++++++++++++++++++++++++--
source/encoder/slicetype.cpp | 97 +++++++++++++-----
source/x265.h | 60 +++++++-----
source/x265cli.h | 2 +
11 files changed, 381 insertions(+), 72 deletions(-)
diffs (truncated from 808 to 300 lines):
diff -r 24fdb661bb57 -r dc7a6c5fbda1 doc/reST/cli.rst
--- a/doc/reST/cli.rst Wed Mar 25 12:49:01 2015 -0500
+++ b/doc/reST/cli.rst Thu Mar 26 11:24:23 2015 -0500
@@ -950,11 +950,36 @@ Slice decision options
**Range of values:** Between the maximum consecutive bframe count (:option:`--bframes`) and 250
+.. option:: --lookahead-slices <0..16>
+
+ Use multiple worker threads to measure the estimated cost of each
+ frame within the lookahead. When :option:`--b-adapt` is 2, most
+ frame cost estimates will be performed in batch mode, many cost
+ estimates at the same time, and lookahead-slices is ignored for
+ batched estimates. The effect on performance can be quite small.
+ The higher this parameter, the less accurate the frame costs will be
+ (since context is lost across slice boundaries) which will result in
+ less accurate B-frame and scene-cut decisions.
+
+ The encoder may internally lower the number of slices to ensure
+ each slice codes at least 10 16x16 rows of lowres blocks. If slices
+ are used in lookahead, the are logged in the list of tools as
+ *lslices*.
+
+ **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+
.. option:: --b-adapt <integer>
- Adaptive B frame scheduling. Default 2
+ Set the level of effort in determining B frame placement.
- **Values:** 0:none; 1:fast; 2:full(trellis)
+ With b-adapt 0, the GOP structure is fixed based on the values of
+ :option:`--keyint` and :option:`--bframes`.
+
+ With b-adapt 1 a light lookahead is used to choose B frame placement.
+
+ With b-adapt 2 (trellis) a viterbi B path selection is performed
+
+ **Values:** 0:none; 1:fast; 2:full(trellis) **default**
.. option:: --bframes, -b <0..16>
diff -r 24fdb661bb57 -r dc7a6c5fbda1 doc/reST/threading.rst
--- a/doc/reST/threading.rst Wed Mar 25 12:49:01 2015 -0500
+++ b/doc/reST/threading.rst Thu Mar 26 11:24:23 2015 -0500
@@ -223,7 +223,8 @@ Lookahead
The lookahead module of x265 (the lowres pre-encode which determines
scene cuts and slice types) uses the thread pool to distribute the
lowres cost analysis to worker threads. It will use bonded task groups
-to perform batches of frame cost estimates.
+to perform batches of frame cost estimates, and it may optionally use
+bonded task groups to measure single frame cost estimates using slices.
The function slicetypeDecide() itself is also be performed by a worker
thread if your encoder has a thread pool, else it runs within the
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/CMakeLists.txt
--- a/source/CMakeLists.txt Wed Mar 25 12:49:01 2015 -0500
+++ b/source/CMakeLists.txt Thu Mar 26 11:24:23 2015 -0500
@@ -24,7 +24,7 @@ include(CheckSymbolExists)
include(CheckCXXCompilerFlag)
# X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 49)
+set(X265_BUILD 50)
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/param.cpp
--- a/source/common/param.cpp Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/param.cpp Thu Mar 26 11:24:23 2015 -0500
@@ -138,6 +138,7 @@ void x265_param_default(x265_param* para
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
param->bBPyramid = 1;
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
+ param->lookaheadSlices = 0;
/* Intra Coding Tools */
param->bEnableConstrainedIntra = 0;
@@ -598,6 +599,7 @@ int x265_param_parse(x265_param* p, cons
OPT2("constrained-intra", "cip") p->bEnableConstrainedIntra = atobool(value);
OPT("fast-intra") p->bEnableFastIntra = atobool(value);
OPT("open-gop") p->bOpenGOP = atobool(value);
+ OPT("lookahead-slices") p->lookaheadSlices = atoi(value);
OPT("scenecut")
{
p->scenecutThreshold = atobool(value);
@@ -1063,6 +1065,8 @@ int x265_check_params(x265_param* param)
"max consecutive bframe count must be 16 or smaller");
CHECK(param->lookaheadDepth > X265_LOOKAHEAD_MAX,
"Lookahead depth must be less than 256");
+ CHECK(param->lookaheadSlices > 16 || param->lookaheadSlices < 0,
+ "Lookahead slices must between 0 and 16");
CHECK(param->rc.aqMode < X265_AQ_NONE || X265_AQ_AUTO_VARIANCE < param->rc.aqMode,
"Aq-Mode is out of range");
CHECK(param->rc.aqStrength < 0 || param->rc.aqStrength > 3,
@@ -1304,6 +1308,7 @@ void x265_print_params(x265_param* param
TOOLOPT(param->bIntraInBFrames, "b-intra");
TOOLOPT(param->bEnableFastIntra, "fast-intra");
TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
+ TOOLVAL(param->lookaheadSlices, "lslices=%d");
if (param->bEnableLoopFilter)
{
if (param->deblockingFilterBetaOffset || param->deblockingFilterTCOffset)
@@ -1365,6 +1370,7 @@ char *x265_param2string(x265_param* p)
s += sprintf(s, " min-keyint=%d", p->keyframeMin);
s += sprintf(s, " scenecut=%d", p->scenecutThreshold);
s += sprintf(s, " rc-lookahead=%d", p->lookaheadDepth);
+ s += sprintf(s, " lookahead-slices=%d", p->lookaheadSlices);
s += sprintf(s, " bframes=%d", p->bframes);
s += sprintf(s, " bframe-bias=%d", p->bFrameBias);
s += sprintf(s, " b-adapt=%d", p->bFrameAdaptive);
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Thu Mar 26 11:24:23 2015 -0500
@@ -1782,6 +1782,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_hps = x265_interp_4tap_horiz_ps_32x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_hps = x265_interp_4tap_horiz_ps_2x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].filter_hps = x265_interp_4tap_horiz_ps_2x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_hps = x265_interp_4tap_horiz_ps_6x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_hpp = x265_interp_4tap_horiz_pp_24x32_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/ipfilter8.asm Thu Mar 26 11:24:23 2015 -0500
@@ -20484,3 +20484,75 @@ cglobal interp_4tap_horiz_pp_24x32, 4,6,
dec r4d
jnz .loop
RET
+
+;-----------------------------------------------------------------------------------------------------------------------------
+; void interp_4tap_horiz_ps_6x8(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, int isRowExt)
+;-----------------------------------------------------------------------------------------------------------------------------;
+INIT_YMM avx2
+cglobal interp_4tap_horiz_ps_6x8, 4,7,6
+ mov r4d, r4m
+ mov r5d, r5m
+ add r3d, r3d
+
+%ifdef PIC
+ lea r6, [tab_ChromaCoeff]
+ vpbroadcastd m0, [r6 + r4 * 4]
+%else
+ vpbroadcastd m0, [tab_ChromaCoeff + r4 * 4]
+%endif
+
+ vbroadcasti128 m2, [pw_1]
+ vbroadcasti128 m5, [pw_2000]
+ mova m1, [tab_Tm]
+
+ ; register map
+ ; m0 - interpolate coeff
+ ; m1 - shuffle order table
+ ; m2 - constant word 1
+
+ mov r6d, 8/2
+ dec r0
+ test r5d, r5d
+ jz .loop
+ sub r0 , r1
+ inc r6d
+
+.loop
+ ; Row 0
+ vbroadcasti128 m3, [r0]
+ pshufb m3, m1
+ pmaddubsw m3, m0
+ pmaddwd m3, m2
+
+ ; Row 1
+ vbroadcasti128 m4, [r0 + r1]
+ pshufb m4, m1
+ pmaddubsw m4, m0
+ pmaddwd m4, m2
+ packssdw m3, m4
+ psubw m3, m5
+ vpermq m3, m3, 11011000b
+ vextracti128 xm4, m3, 1
+ movq [r2], xm3
+ pextrd [r2 + 8], xm3, 2
+ movq [r2 + r3], xm4
+ pextrd [r2 + r3 + 8], xm4, 2
+ lea r2, [r2 + r3 * 2]
+ lea r0, [r0 + r1 * 2]
+ dec r6d
+ jnz .loop
+ test r5d, r5d
+ jz .end
+
+ ;Row 11
+ vbroadcasti128 m3, [r0]
+ pshufb m3, m1
+ pmaddubsw m3, m0
+ pmaddwd m3, m2
+ packssdw m3, m3
+ psubw m3, m5
+ vextracti128 xm4, m3, 1
+ movq [r2], xm3
+ movd [r2+8], xm4
+.end
+ RET
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/common/x86/ssd-a.asm
--- a/source/common/x86/ssd-a.asm Wed Mar 25 12:49:01 2015 -0500
+++ b/source/common/x86/ssd-a.asm Thu Mar 26 11:24:23 2015 -0500
@@ -822,10 +822,10 @@ SSD_SS_64xN
%if HIGH_BIT_DEPTH == 0
%macro SSD_LOAD_FULL 5
- mova m1, [t0+%1]
- mova m2, [t2+%2]
- mova m3, [t0+%3]
- mova m4, [t2+%4]
+ movu m1, [t0+%1]
+ movu m2, [t2+%2]
+ movu m3, [t0+%3]
+ movu m4, [t2+%4]
%if %5==1
add t0, t1
add t2, t3
diff -r 24fdb661bb57 -r dc7a6c5fbda1 source/encoder/entropy.cpp
--- a/source/encoder/entropy.cpp Wed Mar 25 12:49:01 2015 -0500
+++ b/source/encoder/entropy.cpp Thu Mar 26 11:24:23 2015 -0500
@@ -1555,22 +1555,173 @@ void Entropy::codeCoeffNxN(const CUData&
// encode significant_coeff_flag
if (sigCoeffGroupFlag64 & cgBlkPosMask)
{
+ X265_CHECK((log2TrSize != 2) || (log2TrSize == 2 && subSet == 0), "log2TrSize and subSet mistake!\n");
const int patternSigCtx = Quant::calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
- uint32_t blkPos, sig, ctxSig;
- for (; scanPosSigOff >= 0; scanPosSigOff--)
+
+ static const uint8_t ctxIndMap4x4[16] =
{
- blkPos = codingParameters.scan[subPosBase + scanPosSigOff];
- sig = scanFlagMask & 1;
- scanFlagMask >>= 1;
- X265_CHECK((uint32_t)(coeff[blkPos] != 0) == sig, "sign bit mistake\n");
- if (scanPosSigOff != 0 || subSet == 0 || numNonZero)
+ 0, 1, 4, 5,
+ 2, 3, 4, 5,
+ 6, 6, 8, 8,
+ 7, 7, 8, 8
+ };
+ // NOTE: [patternSigCtx][posXinSubset][posYinSubset]
+ static const uint8_t table_cnt[4][4][4] =
+ {
+ // patternSigCtx = 0
{
- ctxSig = Quant::getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext);
- encodeBin(sig, baseCtx[ctxSig]);
+ { 2, 1, 1, 0 },
+ { 1, 1, 0, 0 },
+ { 1, 0, 0, 0 },
+ { 0, 0, 0, 0 },
+ },
+ // patternSigCtx = 1
+ {
+ { 2, 1, 0, 0 },
+ { 2, 1, 0, 0 },
+ { 2, 1, 0, 0 },
+ { 2, 1, 0, 0 },
+ },
+ // patternSigCtx = 2
+ {
+ { 2, 2, 2, 2 },
+ { 1, 1, 1, 1 },
+ { 0, 0, 0, 0 },
+ { 0, 0, 0, 0 },
+ },
+ // patternSigCtx = 3
+ {
+ { 2, 2, 2, 2 },
+ { 2, 2, 2, 2 },
+ { 2, 2, 2, 2 },
+ { 2, 2, 2, 2 },
}
- absCoeff[numNonZero] = int(abs(coeff[blkPos]));
- numNonZero += sig;
+ };
+ if (m_bitIf)
+ {
+ if (log2TrSize == 2)
+ {
+ uint32_t blkPos, sig, ctxSig;
+ for (; scanPosSigOff >= 0; scanPosSigOff--)
+ {
+ blkPos = codingParameters.scan[subPosBase + scanPosSigOff];
+ sig = scanFlagMask & 1;
+ scanFlagMask >>= 1;
+ X265_CHECK((uint32_t)(coeff[blkPos] != 0) == sig, "sign bit mistake\n");
+ {
+ ctxSig = ctxIndMap4x4[blkPos];
+ X265_CHECK(ctxSig == Quant::getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext), "sigCtx mistake!\n");;
+ encodeBin(sig, baseCtx[ctxSig]);
+ }
+ absCoeff[numNonZero] = int(abs(coeff[blkPos]));
+ numNonZero += sig;
+ }
+ }
+ else
More information about the x265-commits
mailing list