[x265-commits] [x265] api: add --allow-non-conformance param, default to False
Steve Borho
steve at borho.org
Tue Apr 7 04:08:00 CEST 2015
details: http://hg.videolan.org/x265/rev/775436f7364d
branches:
changeset: 10071:775436f7364d
user: Steve Borho <steve at borho.org>
date: Sun Apr 05 12:56:40 2015 -0500
description:
api: add --allow-non-conformance param, default to False
The encoder will now abort any encode that would result in a non-conformant
stream, unless --allow-non-conformance is specified
Subject: [x265] asm: luma_hps[12x16] avx2 - improved 3779c->2482c
details: http://hg.videolan.org/x265/rev/0e097d6d57cf
branches:
changeset: 10072:0e097d6d57cf
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Apr 06 09:17:08 2015 +0530
description:
asm: luma_hps[12x16] avx2 - improved 3779c->2482c
Subject: [x265] asm: luma_hps[24x32] avx2 - improved 11545c->6843c
details: http://hg.videolan.org/x265/rev/02b4942ce999
branches:
changeset: 10073:02b4942ce999
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Apr 06 09:19:14 2015 +0530
description:
asm: luma_hps[24x32] avx2 - improved 11545c->6843c
Subject: [x265] asm: chroma_hps[24x32] avx2 - improved 4458c->3583c
details: http://hg.videolan.org/x265/rev/60c6a48a292c
branches:
changeset: 10074:60c6a48a292c
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Apr 06 09:21:20 2015 +0530
description:
asm: chroma_hps[24x32] avx2 - improved 4458c->3583c
Subject: [x265] asm: luma_hvpp[16x16] - 11.39x 5226c
details: http://hg.videolan.org/x265/rev/3849ba2347de
branches:
changeset: 10075:3849ba2347de
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Apr 06 09:38:55 2015 +0530
description:
asm: luma_hvpp[16x16] - 11.39x 5226c
Subject: [x265] asm: improve the old avx2 code for sad[32x24]
details: http://hg.videolan.org/x265/rev/809339fb90b5
branches:
changeset: 10076:809339fb90b5
user: Sumalatha Polureddy
date: Mon Apr 06 11:47:55 2015 +0530
description:
asm: improve the old avx2 code for sad[32x24]
old:
sad[32x24] 14.26x 490.58 6995.66
new:
sad[32x24] 16.33x 428.35 6993.57
Subject: [x265] asm: intra_pred_ang4_8 improved by ~24% over SSE4
details: http://hg.videolan.org/x265/rev/d317f9252f40
branches:
changeset: 10077:d317f9252f40
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 10:58:30 2015 +0530
description:
asm: intra_pred_ang4_8 improved by ~24% over SSE4
AVX2:
intra_ang_4x4[ 8] 9.58x 110.01 1053.65
SSE4:
intra_ang_4x4[ 8] 7.26x 146.78 1065.62
Subject: [x265] asm: intra_pred_ang4_7 improved by ~42% over SSE4
details: http://hg.videolan.org/x265/rev/aaa31e85a137
branches:
changeset: 10078:aaa31e85a137
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 11:31:02 2015 +0530
description:
asm: intra_pred_ang4_7 improved by ~42% over SSE4
AVX2:
intra_ang_4x4[ 7] 10.24x 98.65 1009.92
SSE4:
intra_ang_4x4[ 7] 6.25x 169.98 1061.89
Subject: [x265] asm: intra_pred_ang4_6 improved by ~36% over SSE4
details: http://hg.videolan.org/x265/rev/24571357bee9
branches:
changeset: 10079:24571357bee9
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 12:05:52 2015 +0530
description:
asm: intra_pred_ang4_6 improved by ~36% over SSE4
AVX2:
intra_ang_4x4[ 6] 10.08x 101.69 1024.92
SSE4:
intra_ang_4x4[ 6] 6.60x 160.00 1055.62
Subject: [x265] asm: intra_pred_ang4_5 improved by ~41% over SSE4
details: http://hg.videolan.org/x265/rev/c570567a2760
branches:
changeset: 10080:c570567a2760
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 12:18:54 2015 +0530
description:
asm: intra_pred_ang4_5 improved by ~41% over SSE4
AVX2:
intra_ang_4x4[ 5] 9.56x 103.43 989.01
SSE4:
intra_ang_4x4[ 5] 5.99x 176.06 1055.48
Subject: [x265] asm: intra_pred_ang4_4 improved by ~44% over SSE4
details: http://hg.videolan.org/x265/rev/cd6ea2f38499
branches:
changeset: 10081:cd6ea2f38499
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 12:31:17 2015 +0530
description:
asm: intra_pred_ang4_4 improved by ~44% over SSE4
AVX2:
intra_ang_4x4[ 4] 10.62x 94.02 998.80
SSE4:
intra_ang_4x4[ 4] 5.89x 169.02 994.88
Subject: [x265] asm: intra_pred_ang4_3 improved by ~41% over SSE4
details: http://hg.videolan.org/x265/rev/141e2904e2ac
branches:
changeset: 10082:141e2904e2ac
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Mon Apr 06 12:47:11 2015 +0530
description:
asm: intra_pred_ang4_3 improved by ~41% over SSE4
AVX2:
intra_ang_4x4[ 3] 10.17x 97.09 987.20
SSE4:
intra_ang_4x4[ 3] 6.42x 167.16 1072.98
Subject: [x265] sao: modify C and SSE4 code for saoCuOrgE0 to process 2 rows
details: http://hg.videolan.org/x265/rev/b84fe6497aa5
branches:
changeset: 10083:b84fe6497aa5
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Apr 06 13:56:47 2015 +0530
description:
sao: modify C and SSE4 code for saoCuOrgE0 to process 2 rows
Subject: [x265] asm: saoCuOrgE0 avx2 code: 756c->629c
details: http://hg.videolan.org/x265/rev/64b7d2b4aac7
branches:
changeset: 10084:64b7d2b4aac7
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Apr 06 14:38:43 2015 +0530
description:
asm: saoCuOrgE0 avx2 code: 756c->629c
Subject: [x265] asm: improve the old avx2 code for sad[64x64]
details: http://hg.videolan.org/x265/rev/7e5b68eba341
branches:
changeset: 10085:7e5b68eba341
user: Sumalatha Polureddy
date: Mon Apr 06 14:53:36 2015 +0530
description:
asm: improve the old avx2 code for sad[64x64]
old:
sad[64x64] 21.47x 1702.40 36545.14
new:
sad[64x64] 22.89x 1595.16 36506.87
Subject: [x265] asm: improve old avx2 code for sad[64x48]
details: http://hg.videolan.org/x265/rev/ca0d3bb3de69
branches:
changeset: 10086:ca0d3bb3de69
user: Sumalatha Polureddy
date: Mon Apr 06 15:51:50 2015 +0530
description:
asm: improve old avx2 code for sad[64x48]
old:
sad[64x48] 16.79x 1504.65 25267.23
new:
sad[64x48] 20.18x 1260.99 25451.33
Subject: [x265] asm: ssse3 8bpp code for convert_p2s[12xN],[24xN],[48x64]
details: http://hg.videolan.org/x265/rev/6d1c2339d9b9
branches:
changeset: 10087:6d1c2339d9b9
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Mon Apr 06 15:02:05 2015 +0530
description:
asm: ssse3 8bpp code for convert_p2s[12xN],[24xN],[48x64]
convert_p2s[12x16](9.82x), convert_p2s[24x32](13.61x),
convert_p2s[48x64](11.12x)
Subject: [x265] asm: sse4 8bpp code for chroma_p2s[6xN] for i420, i422
details: http://hg.videolan.org/x265/rev/57956d20dc48
branches:
changeset: 10088:57956d20dc48
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Mon Apr 06 15:06:06 2015 +0530
description:
asm: sse4 8bpp code for chroma_p2s[6xN] for i420, i422
chroma_p2s[6x8][i420](2.75x), chroma_p2s[6x16][i422](2.96x)
Subject: [x265] asm: ssse3 8bpp code for chroma_p2s[8x6](4.74x) for i420
details: http://hg.videolan.org/x265/rev/64d96f1ac0bd
branches:
changeset: 10089:64d96f1ac0bd
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Mon Apr 06 15:08:57 2015 +0530
description:
asm: ssse3 8bpp code for chroma_p2s[8x6](4.74x) for i420
Subject: [x265] asm: ssse3 8bpp code for chroma_p2s i422, reuse luma code
details: http://hg.videolan.org/x265/rev/7db85dc198a4
branches:
changeset: 10090:7db85dc198a4
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Mon Apr 06 16:22:54 2015 +0530
description:
asm: ssse3 8bpp code for chroma_p2s i422, reuse luma code
chroma_p2s[4x32](3.78), chroma_p2s[8x12](5.25x), chroma_p2s[8x64](6.65x),
chroma_p2s[12x32](9.57x), chroma_p2s[16x24](12.96x),
chroma_p2s[16x24](12.56x), chroma_p2s[24x64](13.66x),
chroma_p2s[32x48](9.83x)
Subject: [x265] improve rdoQuant by reduce type convert and condition check
details: http://hg.videolan.org/x265/rev/7d9aa340f950
branches:
changeset: 10091:7d9aa340f950
user: Min Chen <chenm003 at 163.com>
date: Mon Apr 06 20:18:01 2015 +0800
description:
improve rdoQuant by reduce type convert and condition check
Subject: [x265] fix count of shift overflow bug in Quant::getSigCoeffGroupCtxInc
details: http://hg.videolan.org/x265/rev/cfd3c423c0bc
branches:
changeset: 10092:cfd3c423c0bc
user: Min Chen <chenm003 at 163.com>
date: Mon Apr 06 20:17:52 2015 +0800
description:
fix count of shift overflow bug in Quant::getSigCoeffGroupCtxInc
Subject: [x265] improve rdoQuant by more parameters on getSigCoeffGroupCtxInc and calcPatternSigCtx
details: http://hg.videolan.org/x265/rev/bac58ebf8d86
branches:
changeset: 10093:bac58ebf8d86
user: Min Chen <chenm003 at 163.com>
date: Mon Apr 06 20:17:57 2015 +0800
description:
improve rdoQuant by more parameters on getSigCoeffGroupCtxInc and calcPatternSigCtx
Subject: [x265] cli: rewrite pts_queue to use new/delete, not to confuse the leak tool
details: http://hg.videolan.org/x265/rev/e35d7fe9e974
branches:
changeset: 10094:e35d7fe9e974
user: Xinyue Lu <i at 7086.in>
date: Mon Apr 06 15:39:24 2015 -0700
description:
cli: rewrite pts_queue to use new/delete, not to confuse the leak tool
Subject: [x265] level: allow unbounded level 8.5 to be used for lossless encodes
details: http://hg.videolan.org/x265/rev/0ce13ce29304
branches:
changeset: 10095:0ce13ce29304
user: Steve Borho <steve at borho.org>
date: Mon Apr 06 21:02:36 2015 -0500
description:
level: allow unbounded level 8.5 to be used for lossless encodes
Lossless has no rate control, obviously, so it does not generally fit in any of
the given levels but I think it is better to signal a valid profile (main,
main10, main10 4:4:4, etc) together with level 8.5 than to signal profile and
level as NONE. If anyone knows a better solution for this, please enlighten me.
This workaround prevents the need for --allow-non-conformance with --lossless
diffstat:
doc/reST/cli.rst | 15 +-
source/CMakeLists.txt | 2 +-
source/common/loopfilter.cpp | 21 +-
source/common/param.cpp | 1 +
source/common/primitives.h | 2 +-
source/common/quant.cpp | 60 +-
source/common/quant.h | 33 +-
source/common/slice.h | 1 +
source/common/x86/asm-primitives.cpp | 25 +
source/common/x86/intrapred.h | 6 +
source/common/x86/intrapred8.asm | 75 +++
source/common/x86/ipfilter8.asm | 700 +++++++++++++++++++++++++++++++++++
source/common/x86/ipfilter8.h | 27 +-
source/common/x86/loopfilter.asm | 116 +++++-
source/common/x86/loopfilter.h | 3 +-
source/common/x86/sad-a.asm | 99 ++--
source/encoder/api.cpp | 7 +
source/encoder/entropy.cpp | 4 +-
source/encoder/level.cpp | 15 +-
source/encoder/sao.cpp | 25 +-
source/test/pixelharness.cpp | 10 +-
source/x265.cpp | 33 +-
source/x265.h | 4 +
source/x265cli.h | 3 +
24 files changed, 1145 insertions(+), 142 deletions(-)
diffs (truncated from 1960 to 300 lines):
diff -r ebe5e57c4b45 -r 0ce13ce29304 doc/reST/cli.rst
--- a/doc/reST/cli.rst Sat Apr 04 15:11:39 2015 -0500
+++ b/doc/reST/cli.rst Mon Apr 06 21:02:36 2015 -0500
@@ -464,11 +464,22 @@ Profile, Level, Tier
HEVC specification. If x265 detects that the total reference count
is greater than 8, it will issue a warning that the resulting stream
is non-compliant and it signals the stream as profile NONE and level
- NONE but still allows the encode to continue. Compliant HEVC
+ NONE and will abort the encode unless
+ :option:`--allow-non-conformance` it specified. Compliant HEVC
decoders may refuse to decode such streams.
Default 3
+.. option:: --allow-non-conformance, --no-allow-non-conformance
+
+ Allow libx265 to generate a bitstream with profile and level NONE.
+ By default it will abort any encode which does not meet strict level
+ compliance. The two most likely causes for non-conformance are
+ :option:`--ctu` being too small, :option:`--ref` being too high,
+ or the bitrate or resolution being out of specification.
+
+ Default: disabled
+
.. note::
:option:`--profile`, :option:`--level-idc`, and
:option:`--high-tier` are only intended for use when you are
@@ -476,7 +487,7 @@ Profile, Level, Tier
limitations and must constrain the bitstream within those limits.
Specifying a profile or level may lower the encode quality
parameters to meet those requirements but it will never raise
- them.
+ them. It may enable VBV constraints on a CRF encode.
Mode decision / Analysis
========================
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/CMakeLists.txt
--- a/source/CMakeLists.txt Sat Apr 04 15:11:39 2015 -0500
+++ b/source/CMakeLists.txt Mon Apr 06 21:02:36 2015 -0500
@@ -30,7 +30,7 @@ option(STATIC_LINK_CRT "Statically link
mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
# X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 52)
+set(X265_BUILD 53)
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/common/loopfilter.cpp
--- a/source/common/loopfilter.cpp Sat Apr 04 15:11:39 2015 -0500
+++ b/source/common/loopfilter.cpp Mon Apr 06 21:02:36 2015 -0500
@@ -42,18 +42,23 @@ void calSign(int8_t *dst, const pixel *s
dst[x] = signOf(src1[x] - src2[x]);
}
-void processSaoCUE0(pixel * rec, int8_t * offsetEo, int width, int8_t signLeft)
+void processSaoCUE0(pixel * rec, int8_t * offsetEo, int width, int8_t* signLeft, intptr_t stride)
{
- int x;
- int8_t signRight;
+ int x, y;
+ int8_t signRight, signLeft0;
int8_t edgeType;
- for (x = 0; x < width; x++)
+ for (y = 0; y < 2; y++)
{
- signRight = ((rec[x] - rec[x + 1]) < 0) ? -1 : ((rec[x] - rec[x + 1]) > 0) ? 1 : 0;
- edgeType = signRight + signLeft + 2;
- signLeft = -signRight;
- rec[x] = x265_clip(rec[x] + offsetEo[edgeType]);
+ signLeft0 = signLeft[y];
+ for (x = 0; x < width; x++)
+ {
+ signRight = ((rec[x] - rec[x + 1]) < 0) ? -1 : ((rec[x] - rec[x + 1]) > 0) ? 1 : 0;
+ edgeType = signRight + signLeft0 + 2;
+ signLeft0 = -signRight;
+ rec[x] = x265_clip(rec[x] + offsetEo[edgeType]);
+ }
+ rec += stride;
}
}
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/common/param.cpp
--- a/source/common/param.cpp Sat Apr 04 15:11:39 2015 -0500
+++ b/source/common/param.cpp Mon Apr 06 21:02:36 2015 -0500
@@ -565,6 +565,7 @@ int x265_param_parse(x265_param* p, cons
p->levelIdc = atoi(value);
}
OPT("high-tier") p->bHighTier = atobool(value);
+ OPT("allow-non-conformance") p->bAllowNonConformance = atobool(value);
OPT2("log-level", "log")
{
p->logLevel = atoi(value);
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/common/primitives.h
--- a/source/common/primitives.h Sat Apr 04 15:11:39 2015 -0500
+++ b/source/common/primitives.h Mon Apr 06 21:02:36 2015 -0500
@@ -168,7 +168,7 @@ typedef void (*pixel_add_ps_t)(pixel* a,
typedef void (*pixelavg_pp_t)(pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int weight);
typedef void (*addAvg_t)(const int16_t* src0, const int16_t* src1, pixel* dst, intptr_t src0Stride, intptr_t src1Stride, intptr_t dstStride);
-typedef void (*saoCuOrgE0_t)(pixel* rec, int8_t* offsetEo, int width, int8_t signLeft);
+typedef void (*saoCuOrgE0_t)(pixel* rec, int8_t* offsetEo, int width, int8_t* signLeft, intptr_t stride);
typedef void (*saoCuOrgE1_t)(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
typedef void (*saoCuOrgE2_t)(pixel* rec, int8_t* pBufft, int8_t* pBuff1, int8_t* offsetEo, int lcuWidth, intptr_t stride);
typedef void (*saoCuOrgE3_t)(pixel* rec, int8_t* upBuff1, int8_t* m_offsetEo, intptr_t stride, int startX, int endX);
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/common/quant.cpp
--- a/source/common/quant.cpp Sat Apr 04 15:11:39 2015 -0500
+++ b/source/common/quant.cpp Mon Apr 06 21:02:36 2015 -0500
@@ -50,6 +50,11 @@ inline int fastMin(int x, int y)
return y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
}
+inline int fastMax(int x, int y)
+{
+ return x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
+}
+
inline int getICRate(uint32_t absLevel, int32_t diffLevel, const int* greaterOneBits, const int* levelAbsBits, const uint32_t absGoRice, const uint32_t maxVlc, uint32_t c1c2Idx)
{
X265_CHECK(c1c2Idx <= 3, "c1c2Idx check failure\n");
@@ -515,6 +520,7 @@ uint32_t Quant::rdoQuant(const CUData& c
{
int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize; /* Represents scaling through forward transform */
int scalingListType = (cu.isIntra(absPartIdx) ? 0 : 3) + ttype;
+ const uint32_t usePsyMask = usePsy ? -1 : 0;
X265_CHECK(scalingListType < 6, "scaling list type out of range\n");
@@ -595,14 +601,14 @@ uint32_t Quant::rdoQuant(const CUData& c
const uint64_t cgBlkPosMask = ((uint64_t)1 << cgBlkPos);
memset(&cgRdStats, 0, sizeof(coeffGroupRDStats));
- const int patternSigCtx = calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codeParams.log2TrSizeCG);
+ const int patternSigCtx = calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, cgBlkPos, (trSize >> MLS_CG_LOG2_SIZE));
/* iterate over coefficients in each group in reverse scan order */
for (int scanPosinCG = cgSize - 1; scanPosinCG >= 0; scanPosinCG--)
{
scanPos = (cgScanPos << MLS_CG_SIZE) + scanPosinCG;
uint32_t blkPos = codeParams.scan[scanPos];
- uint16_t maxAbsLevel = (int16_t)abs(dstCoeff[blkPos]); /* abs(quantized coeff) */
+ uint32_t maxAbsLevel = abs(dstCoeff[blkPos]); /* abs(quantized coeff) */
int signCoef = m_resiDctCoeff[blkPos]; /* pre-quantization DCT coeff */
int predictedCoef = m_fencDctCoeff[blkPos] - signCoef; /* predicted DCT = source DCT - residual DCT*/
@@ -611,8 +617,8 @@ uint32_t Quant::rdoQuant(const CUData& c
* FIX15 nature of the CABAC cost tables minus the forward transform scale */
/* cost of not coding this coefficient (all distortion, no signal bits) */
- costUncoded[scanPos] = (int64_t)(signCoef * signCoef) << scaleBits;
- if (usePsy && blkPos)
+ costUncoded[scanPos] = ((int64_t)signCoef * signCoef) << scaleBits;
+ if (usePsyMask & blkPos)
/* when no residual coefficient is coded, predicted coef == recon coef */
costUncoded[scanPos] -= PSYVALUE(predictedCoef);
@@ -652,7 +658,7 @@ uint32_t Quant::rdoQuant(const CUData& c
const int* greaterOneBits = estBitsSbac.greaterOneBits[oneCtx];
const int* levelAbsBits = estBitsSbac.levelAbsBits[absCtx];
- uint16_t level = 0;
+ uint32_t level = 0;
uint32_t sigCoefBits = 0;
costCoeff[scanPos] = MAX_INT64;
@@ -672,8 +678,11 @@ uint32_t Quant::rdoQuant(const CUData& c
}
if (maxAbsLevel)
{
- uint16_t minAbsLevel = X265_MAX(maxAbsLevel - 1, 1);
- for (uint16_t lvl = maxAbsLevel; lvl >= minAbsLevel; lvl--)
+ // NOTE: X265_MAX(maxAbsLevel - 1, 1) ==> (X>=2 -> X-1), (X<2 -> 1) | (0 < X < 2 ==> X=1)
+ uint32_t minAbsLevel = (maxAbsLevel - 1);
+ if (maxAbsLevel == 1)
+ minAbsLevel = 1;
+ for (uint32_t lvl = maxAbsLevel; lvl >= minAbsLevel; lvl--)
{
uint32_t levelBits = getICRateCost(lvl, lvl - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx) + IEP_RATE;
@@ -682,7 +691,7 @@ uint32_t Quant::rdoQuant(const CUData& c
int64_t curCost = RDCOST(d, sigCoefBits + levelBits);
/* Psy RDOQ: bias in favor of higher AC coefficients in the reconstructed frame */
- if (usePsy && blkPos)
+ if (usePsyMask & blkPos)
{
int reconCoef = abs(unquantAbsLevel + SIGN(predictedCoef, signCoef));
curCost -= PSYVALUE(reconCoef);
@@ -697,7 +706,7 @@ uint32_t Quant::rdoQuant(const CUData& c
}
}
- dstCoeff[blkPos] = level;
+ dstCoeff[blkPos] = (int16_t)level;
totalRdCost += costCoeff[scanPos];
/* record costs for sign-hiding performed at the end */
@@ -815,7 +824,7 @@ uint32_t Quant::rdoQuant(const CUData& c
* of the significant coefficient group flag and evaluate whether the RD cost of the
* coded group is more than the RD cost of the uncoded group */
- uint32_t sigCtx = getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, codeParams.log2TrSizeCG);
+ uint32_t sigCtx = getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, cgBlkPos, (trSize >> MLS_CG_LOG2_SIZE));
int64_t costZeroCG = totalRdCost + SIGCOST(estBitsSbac.significantCoeffGroupBits[sigCtx][0]);
costZeroCG += cgRdStats.uncodedDist; /* add distortion for resetting non-zero levels to zero levels */
@@ -848,7 +857,7 @@ uint32_t Quant::rdoQuant(const CUData& c
else
{
/* there were no coded coefficients in this coefficient group */
- uint32_t ctxSig = getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, codeParams.log2TrSizeCG);
+ uint32_t ctxSig = getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, cgBlkPos, (trSize >> MLS_CG_LOG2_SIZE));
costCoeffGroupSig[cgScanPos] = SIGCOST(estBitsSbac.significantCoeffGroupBits[ctxSig][0]);
totalRdCost += costCoeffGroupSig[cgScanPos]; /* add cost of 0 bit in significant CG bitmap */
totalRdCost -= cgRdStats.sigCost; /* remove cost of significant coefficient bitmap */
@@ -909,7 +918,7 @@ uint32_t Quant::rdoQuant(const CUData& c
* cost of signaling it as not-significant */
uint32_t blkPos = codeParams.scan[scanPos];
if (dstCoeff[blkPos])
- {
+ {
// Calculates the cost of signaling the last significant coefficient in the block
uint32_t pos[2] = { (blkPos & (trSize - 1)), (blkPos >> log2TrSize) };
if (codeParams.scanType == SCAN_VER)
@@ -1092,22 +1101,6 @@ uint32_t Quant::rdoQuant(const CUData& c
return numSig;
}
-/* Pattern decision for context derivation process of significant_coeff_flag */
-uint32_t Quant::calcPatternSigCtx(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG)
-{
- if (!log2TrSizeCG)
- return 0;
-
- const uint32_t trSizeCG = 1 << log2TrSizeCG;
- X265_CHECK(trSizeCG <= 8, "transform CG is too large\n");
- const uint32_t shift = (cgPosY << log2TrSizeCG) + cgPosX + 1;
- const uint32_t sigPos = (uint32_t)(shift >= 64 ? 0 : sigCoeffGroupFlag64 >> shift);
- const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & (sigPos & 1);
- const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 2)) & 2;
-
- return sigRight + sigLower;
-}
-
/* Context derivation process of coeff_abs_significant_flag */
uint32_t Quant::getSigCtxInc(uint32_t patternSigCtx, uint32_t log2TrSize, uint32_t trSize, uint32_t blkPos, bool bIsLuma,
uint32_t firstSignificanceMapContext)
@@ -1175,14 +1168,3 @@ uint32_t Quant::getSigCtxInc(uint32_t pa
return (bIsLuma && (posX | posY) >= 4) ? 3 + offset : offset;
}
-/* Context derivation process of coeff_abs_significant_flag */
-uint32_t Quant::getSigCoeffGroupCtxInc(uint64_t cgGroupMask, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG)
-{
- const uint32_t trSizeCG = 1 << log2TrSizeCG;
-
- const uint32_t sigPos = (uint32_t)(cgGroupMask >> (1 + (cgPosY << log2TrSizeCG) + cgPosX));
- const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
- const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
-
- return (sigRight | sigLower) & 1;
-}
diff -r ebe5e57c4b45 -r 0ce13ce29304 source/common/quant.h
--- a/source/common/quant.h Sat Apr 04 15:11:39 2015 -0500
+++ b/source/common/quant.h Mon Apr 06 21:02:36 2015 -0500
@@ -111,10 +111,39 @@ public:
void invtransformNxN(int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig);
+ /* Pattern decision for context derivation process of significant_coeff_flag */
+ static uint32_t calcPatternSigCtx(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t cgBlkPos, uint32_t trSizeCG)
+ {
+ if (trSizeCG == 1)
+ return 0;
+
+ X265_CHECK(trSizeCG <= 8, "transform CG is too large\n");
+ X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
+ // NOTE: cgBlkPos+1 may more than 63, it is invalid for shift,
+ // but in this case, both cgPosX and cgPosY equal to (trSizeCG - 1),
+ // the sigRight and sigLower will clear value to zero, the final result will be correct
+ const uint32_t sigPos = (uint32_t)(sigCoeffGroupFlag64 >> (cgBlkPos + 1)); // just need lowest 7-bits valid
+
+ // TODO: instruction BT is faster, but _bittest64 still generate instruction 'BT m, r' in VS2012
+ const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & (sigPos & 1);
+ const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 2)) & 2;
+ return sigRight + sigLower;
+ }
+
+ /* Context derivation process of coeff_abs_significant_flag */
+ static uint32_t getSigCoeffGroupCtxInc(uint64_t cgGroupMask, uint32_t cgPosX, uint32_t cgPosY, uint32_t cgBlkPos, uint32_t trSizeCG)
+ {
+ X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
+ // NOTE: unsafe shift operator, see NOTE in calcPatternSigCtx
+ const uint32_t sigPos = (uint32_t)(cgGroupMask >> (cgBlkPos + 1)); // just need lowest 8-bits valid
+ const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
+ const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
+
+ return (sigRight | sigLower) & 1;
+ }
More information about the x265-commits
mailing list