[x265-commits] [x265] fix issue #144 10-bit x265 hangs from 1.7+170-4948aeae8a1...
Dnyaneshwar G
dnyaneshwar at multicorewareinc.com
Fri Jun 19 00:41:49 CEST 2015
details: http://hg.videolan.org/x265/rev/d8f12802279d
branches:
changeset: 10650:d8f12802279d
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Wed Jun 17 14:25:32 2015 +0530
description:
fix issue #144 10-bit x265 hangs from 1.7+170-4948aeae8a18 on Win7 64-bit
Subject: [x265] analysis-mode: fix blocking artifacts in analysis-mode load/save
details: http://hg.videolan.org/x265/rev/2dd7e396b3f9
branches:
changeset: 10651:2dd7e396b3f9
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Mon Jun 15 14:32:22 2015 +0530
description:
analysis-mode: fix blocking artifacts in analysis-mode load/save
With analysis-data dumps enabled, blocking artifacts were noticed with merge
candidates. The merge candidate should be used only after choosing the best of
skip and merge with residual.
Subject: [x265] fix issue #143 x265 is slow when it is build with GCC 5.1
details: http://hg.videolan.org/x265/rev/98325f22a1ba
branches: stable
changeset: 10652:98325f22a1ba
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Jun 17 22:14:14 2015 +0530
description:
fix issue #143 x265 is slow when it is build with GCC 5.1
Subject: [x265] analysis-mode: fix blocking artifacts in analysis-mode load/save
details: http://hg.videolan.org/x265/rev/1b87881db758
branches: stable
changeset: 10653:1b87881db758
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Jun 17 22:15:41 2015 +0530
description:
analysis-mode: fix blocking artifacts in analysis-mode load/save
With analysis-data dumps enabled, blocking artifacts were noticed with merge
candidates. The merge candidate should be used only after choosing the best of
skip and merge with residual.
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/d6c32960b5df
branches:
changeset: 10654:d6c32960b5df
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Jun 17 22:16:03 2015 +0530
description:
Merge with stable
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 13 & 23
details: http://hg.videolan.org/x265/rev/7b03df434b5d
branches:
changeset: 10655:7b03df434b5d
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 15:56:48 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 13 & 23
performance improvement over SSE:
intra_ang_32x32[13] 7996c->4784c, 40%
intra_ang_32x32[23] 5797c->2990c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 14 & 22
details: http://hg.videolan.org/x265/rev/88474e625dfb
branches:
changeset: 10656:88474e625dfb
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 15:58:24 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 14 & 22
performance improvement over SSE:
intra_ang_32x32[14] 7997c->4722c, 40%
intra_ang_32x32[22] 5810c->3230c, 44%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 15 & 21
details: http://hg.videolan.org/x265/rev/10690ad2c3d8
branches:
changeset: 10657:10690ad2c3d8
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 15:58:50 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 15 & 21
performance improvement over SSE:
intra_ang_32x32[15] 8337c->4609c, 44%
intra_ang_32x32[21] 6303c->3238c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 16 & 20
details: http://hg.videolan.org/x265/rev/dc09f8816a15
branches:
changeset: 10658:dc09f8816a15
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 15:59:53 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 16 & 20
performance improvement over SSE:
intra_ang_32x32[16] 8032c->4841c, 40%
intra_ang_32x32[20] 6171c->3277c, 47%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 17 & 19
details: http://hg.videolan.org/x265/rev/b495804e003f
branches:
changeset: 10659:b495804e003f
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 16:00:25 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 17 & 19
performance improvement over SSE:
intra_ang_32x32[17] 8392c->4757c, 43%
intra_ang_32x32[19] 6122c->3173c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 18, improved 1331c->884c, 31%
details: http://hg.videolan.org/x265/rev/69c5275261f2
branches:
changeset: 10660:69c5275261f2
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Tue Jun 16 14:10:06 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 18, improved 1331c->884c, 31%
Subject: [x265] clean up debug code in codeCoeffNxN()
details: http://hg.videolan.org/x265/rev/bbb6f4573dab
branches:
changeset: 10661:bbb6f4573dab
user: Min Chen <chenm003 at 163.com>
date: Tue Jun 16 09:45:57 2015 -0700
description:
clean up debug code in codeCoeffNxN()
Subject: [x265] faster algorithm to calculate signHidden cost in codeCoeffNxN()
details: http://hg.videolan.org/x265/rev/80a1a697e993
branches:
changeset: 10662:80a1a697e993
user: Min Chen <chenm003 at 163.com>
date: Tue Jun 16 15:54:03 2015 -0700
description:
faster algorithm to calculate signHidden cost in codeCoeffNxN()
Subject: [x265] improve by convert arithmetic(signed) shift to logic(unsigned) shift
details: http://hg.videolan.org/x265/rev/11f818d6465c
branches:
changeset: 10663:11f818d6465c
user: Min Chen <chenm003 at 163.com>
date: Tue Jun 16 15:54:09 2015 -0700
description:
improve by convert arithmetic(signed) shift to logic(unsigned) shift
Subject: [x265] reduce VC condition branch by modify code style
details: http://hg.videolan.org/x265/rev/103f09e46d32
branches:
changeset: 10664:103f09e46d32
user: Min Chen <chenm003 at 163.com>
date: Tue Jun 16 15:54:22 2015 -0700
description:
reduce VC condition branch by modify code style
Subject: [x265] asm: avx2 code for weight_pp() for 10 bpp
details: http://hg.videolan.org/x265/rev/9482a929901c
branches:
changeset: 10665:9482a929901c
user: Sumalatha Polureddy<sumalatha at multicorewareinc.com>
date: Wed Jun 17 14:58:01 2015 +0530
description:
asm: avx2 code for weight_pp() for 10 bpp
sse4
weight_pp 9.37x 6768.87 63435.43
avx2
weight_pp 16.45x 4187.86 68871.50
Subject: [x265] improve fillReferenceSamples by reduce condition operators in loop
details: http://hg.videolan.org/x265/rev/404788909650
branches:
changeset: 10666:404788909650
user: Min Chen <chenm003 at 163.com>
date: Wed Jun 17 15:00:19 2015 -0700
description:
improve fillReferenceSamples by reduce condition operators in loop
Subject: [x265] asm: dequant_scaling asm code, improved 12668c->11097c, 12% over intrinsic
details: http://hg.videolan.org/x265/rev/65cf14a3eeb1
branches:
changeset: 10667:65cf14a3eeb1
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Wed Jun 17 17:45:35 2015 +0530
description:
asm: dequant_scaling asm code, improved 12668c->11097c, 12% over intrinsic
Subject: [x265] asm: avx2 code for dequant_scaling, improved 11097c->6860c, 38% over SSE4
details: http://hg.videolan.org/x265/rev/6a223bb5b783
branches:
changeset: 10668:6a223bb5b783
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 18 10:11:57 2015 +0530
description:
asm: avx2 code for dequant_scaling, improved 11097c->6860c, 38% over SSE4
Subject: [x265] doc: update strong intra smoothing explanation
details: http://hg.videolan.org/x265/rev/cdbfc7d0b067
branches: stable
changeset: 10669:cdbfc7d0b067
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Jun 18 15:49:48 2015 +0530
description:
doc: update strong intra smoothing explanation
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/3ccd7658b374
branches:
changeset: 10670:3ccd7658b374
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Jun 18 15:54:42 2015 +0530
description:
Merge with stable
Subject: [x265] build: fix exec bit of shell scripts
details: http://hg.videolan.org/x265/rev/29ce75aa8879
branches:
changeset: 10671:29ce75aa8879
user: Steve Borho <steve at borho.org>
date: Thu Jun 18 15:28:36 2015 -0500
description:
build: fix exec bit of shell scripts
Subject: [x265] asm: fix multilib link
details: http://hg.videolan.org/x265/rev/c63f0a1cdac2
branches:
changeset: 10672:c63f0a1cdac2
user: Steve Borho <steve at borho.org>
date: Thu Jun 18 15:36:15 2015 -0500
description:
asm: fix multilib link
Subject: [x265] cli: fix multilib link
details: http://hg.videolan.org/x265/rev/f17684cb7010
branches:
changeset: 10673:f17684cb7010
user: Steve Borho <steve at borho.org>
date: Thu Jun 18 15:42:03 2015 -0500
description:
cli: fix multilib link
Subject: [x265] asm: remove useless comments
details: http://hg.videolan.org/x265/rev/1c6de5ac3883
branches:
changeset: 10674:1c6de5ac3883
user: Steve Borho <steve at borho.org>
date: Thu Jun 18 15:29:11 2015 -0500
description:
asm: remove useless comments
diffstat:
doc/reST/cli.rst | 6 +-
source/common/predict.cpp | 23 +-
source/common/quant.h | 12 +-
source/common/x86/asm-primitives.cpp | 636 ++++----
source/common/x86/intrapred16.asm | 2321 ++++++++++++++++++++++++++++++++++
source/common/x86/pixel-util.h | 1 +
source/common/x86/pixel-util8.asm | 184 ++-
source/encoder/analysis.cpp | 139 +-
source/encoder/analysis.h | 2 +-
source/encoder/entropy.cpp | 21 +-
source/x265.cpp | 2 +-
11 files changed, 2919 insertions(+), 428 deletions(-)
diffs (truncated from 3618 to 300 lines):
diff -r be0ed447922c -r 1c6de5ac3883 doc/reST/cli.rst
--- a/doc/reST/cli.rst Tue Jun 16 11:15:03 2015 +0530
+++ b/doc/reST/cli.rst Thu Jun 18 15:29:11 2015 -0500
@@ -920,7 +920,11 @@ Spatial/intra options
.. option:: --strong-intra-smoothing, --no-strong-intra-smoothing
- Enable strong intra smoothing for 32x32 intra blocks. Default enabled
+ Enable strong intra smoothing for 32x32 intra blocks. This flag
+ performs bi-linear interpolation of the corner reference samples
+ for a strong smoothing effect. The purpose is to prevent blocking
+ or banding artifacts in regions with few/zero AC coefficients.
+ Default enabled
.. option:: --constrained-intra, --no-constrained-intra
diff -r be0ed447922c -r 1c6de5ac3883 source/common/predict.cpp
--- a/source/common/predict.cpp Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/predict.cpp Thu Jun 18 15:29:11 2015 -0500
@@ -776,30 +776,17 @@ void Predict::fillReferenceSamples(const
// Fill left & below-left samples
adiTemp += picStride;
adi--;
- pNeighborFlags--;
- for (int j = 0; j < leftUnits; j++)
+ // NOTE: over copy here, but reduce condition operators
+ for (int j = 0; j < leftUnits * unitHeight; j++)
{
- if (*pNeighborFlags)
- for (int i = 0; i < unitHeight; i++)
- adi[-i] = adiTemp[i * picStride];
-
- adiTemp += unitHeight * picStride;
- adi -= unitHeight;
- pNeighborFlags--;
+ adi[-j] = adiTemp[j * picStride];
}
// Fill above & above-right samples
adiTemp = adiOrigin - picStride;
adi = adiLineBuffer + (leftUnits * unitHeight) + unitWidth;
- pNeighborFlags = bNeighborFlags + leftUnits + 1;
- for (int j = 0; j < aboveUnits; j++)
- {
- if (*pNeighborFlags)
- memcpy(adi, adiTemp, unitWidth * sizeof(*adiTemp));
- adiTemp += unitWidth;
- adi += unitWidth;
- pNeighborFlags++;
- }
+ // NOTE: over copy here, but reduce condition operators
+ memcpy(adi, adiTemp, aboveUnits * unitWidth * sizeof(*adiTemp));
// Pad reference samples when necessary
int curr = 0;
diff -r be0ed447922c -r 1c6de5ac3883 source/common/quant.h
--- a/source/common/quant.h Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/quant.h Thu Jun 18 15:29:11 2015 -0500
@@ -126,9 +126,9 @@ public:
const uint32_t sigPos = (uint32_t)(sigCoeffGroupFlag64 >> (cgBlkPos + 1)); // just need lowest 7-bits valid
// TODO: instruction BT is faster, but _bittest64 still generate instruction 'BT m, r' in VS2012
- const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & (sigPos & 1);
- const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 2)) & 2;
- return sigRight + sigLower;
+ const uint32_t sigRight = ((uint32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
+ const uint32_t sigLower = ((uint32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
+ return sigRight + sigLower * 2;
}
/* Context derivation process of coeff_abs_significant_flag */
@@ -137,10 +137,10 @@ public:
X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
// NOTE: unsafe shift operator, see NOTE in calcPatternSigCtx
const uint32_t sigPos = (uint32_t)(cgGroupMask >> (cgBlkPos + 1)); // just need lowest 8-bits valid
- const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
- const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
+ const uint32_t sigRight = ((uint32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
+ const uint32_t sigLower = ((uint32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
- return (sigRight | sigLower) & 1;
+ return (sigRight | sigLower);
}
/* static methods shared with entropy.cpp */
diff -r be0ed447922c -r 1c6de5ac3883 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp Thu Jun 18 15:29:11 2015 -0500
@@ -1041,8 +1041,6 @@ void setupAssemblyPrimitives(EncoderPrim
p.dst4x4 = PFX(dst4_ssse3);
p.cu[BLOCK_8x8].idct = PFX(idct8_ssse3);
- ALL_LUMA_TU(count_nonzero, count_nonzero, avx2);
-
p.frameInitLowres = PFX(frame_init_lowres_core_ssse3);
ALL_LUMA_PU(convert_p2s, filterPixelToShort, ssse3);
@@ -1106,6 +1104,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.quant = PFX(quant_sse4);
p.nquant = PFX(nquant_sse4);
p.dequant_normal = PFX(dequant_normal_sse4);
+ p.dequant_scaling = PFX(dequant_scaling_sse4);
// p.pu[LUMA_4x4].satd = p.cu[BLOCK_4x4].sa8d = PFX(pixel_satd_4x4_sse4); fails tests
ALL_LUMA_PU(satd, pixel_satd, sse4);
@@ -1310,28 +1309,39 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_16x16].intra_pred[33] = PFX(intra_pred_ang16_33_avx2);
p.cu[BLOCK_16x16].intra_pred[34] = PFX(intra_pred_ang16_2_avx2);
- p.cu[BLOCK_32x32].intra_pred[2] = x265_intra_pred_ang32_2_avx2;
- p.cu[BLOCK_32x32].intra_pred[3] = x265_intra_pred_ang32_3_avx2;
- p.cu[BLOCK_32x32].intra_pred[4] = x265_intra_pred_ang32_4_avx2;
- p.cu[BLOCK_32x32].intra_pred[5] = x265_intra_pred_ang32_5_avx2;
- p.cu[BLOCK_32x32].intra_pred[6] = x265_intra_pred_ang32_6_avx2;
- p.cu[BLOCK_32x32].intra_pred[7] = x265_intra_pred_ang32_7_avx2;
- p.cu[BLOCK_32x32].intra_pred[8] = x265_intra_pred_ang32_8_avx2;
- p.cu[BLOCK_32x32].intra_pred[9] = x265_intra_pred_ang32_9_avx2;
- p.cu[BLOCK_32x32].intra_pred[10] = x265_intra_pred_ang32_10_avx2;
- p.cu[BLOCK_32x32].intra_pred[11] = x265_intra_pred_ang32_11_avx2;
- p.cu[BLOCK_32x32].intra_pred[12] = x265_intra_pred_ang32_12_avx2;
- p.cu[BLOCK_32x32].intra_pred[24] = x265_intra_pred_ang32_24_avx2;
- p.cu[BLOCK_32x32].intra_pred[25] = x265_intra_pred_ang32_25_avx2;
- p.cu[BLOCK_32x32].intra_pred[26] = x265_intra_pred_ang32_26_avx2;
- p.cu[BLOCK_32x32].intra_pred[27] = x265_intra_pred_ang32_27_avx2;
- p.cu[BLOCK_32x32].intra_pred[28] = x265_intra_pred_ang32_28_avx2;
- p.cu[BLOCK_32x32].intra_pred[29] = x265_intra_pred_ang32_29_avx2;
- p.cu[BLOCK_32x32].intra_pred[30] = x265_intra_pred_ang32_30_avx2;
- p.cu[BLOCK_32x32].intra_pred[31] = x265_intra_pred_ang32_31_avx2;
- p.cu[BLOCK_32x32].intra_pred[32] = x265_intra_pred_ang32_32_avx2;
- p.cu[BLOCK_32x32].intra_pred[33] = x265_intra_pred_ang32_33_avx2;
- p.cu[BLOCK_32x32].intra_pred[34] = x265_intra_pred_ang32_2_avx2;
+ p.cu[BLOCK_32x32].intra_pred[2] = PFX(intra_pred_ang32_2_avx2);
+ p.cu[BLOCK_32x32].intra_pred[3] = PFX(intra_pred_ang32_3_avx2);
+ p.cu[BLOCK_32x32].intra_pred[4] = PFX(intra_pred_ang32_4_avx2);
+ p.cu[BLOCK_32x32].intra_pred[5] = PFX(intra_pred_ang32_5_avx2);
+ p.cu[BLOCK_32x32].intra_pred[6] = PFX(intra_pred_ang32_6_avx2);
+ p.cu[BLOCK_32x32].intra_pred[7] = PFX(intra_pred_ang32_7_avx2);
+ p.cu[BLOCK_32x32].intra_pred[8] = PFX(intra_pred_ang32_8_avx2);
+ p.cu[BLOCK_32x32].intra_pred[9] = PFX(intra_pred_ang32_9_avx2);
+ p.cu[BLOCK_32x32].intra_pred[10] = PFX(intra_pred_ang32_10_avx2);
+ p.cu[BLOCK_32x32].intra_pred[11] = PFX(intra_pred_ang32_11_avx2);
+ p.cu[BLOCK_32x32].intra_pred[12] = PFX(intra_pred_ang32_12_avx2);
+ p.cu[BLOCK_32x32].intra_pred[13] = PFX(intra_pred_ang32_13_avx2);
+ p.cu[BLOCK_32x32].intra_pred[14] = PFX(intra_pred_ang32_14_avx2);
+ p.cu[BLOCK_32x32].intra_pred[15] = PFX(intra_pred_ang32_15_avx2);
+ p.cu[BLOCK_32x32].intra_pred[16] = PFX(intra_pred_ang32_16_avx2);
+ p.cu[BLOCK_32x32].intra_pred[17] = PFX(intra_pred_ang32_17_avx2);
+ p.cu[BLOCK_32x32].intra_pred[18] = PFX(intra_pred_ang32_18_avx2);
+ p.cu[BLOCK_32x32].intra_pred[19] = PFX(intra_pred_ang32_19_avx2);
+ p.cu[BLOCK_32x32].intra_pred[20] = PFX(intra_pred_ang32_20_avx2);
+ p.cu[BLOCK_32x32].intra_pred[21] = PFX(intra_pred_ang32_21_avx2);
+ p.cu[BLOCK_32x32].intra_pred[22] = PFX(intra_pred_ang32_22_avx2);
+ p.cu[BLOCK_32x32].intra_pred[23] = PFX(intra_pred_ang32_23_avx2);
+ p.cu[BLOCK_32x32].intra_pred[24] = PFX(intra_pred_ang32_24_avx2);
+ p.cu[BLOCK_32x32].intra_pred[25] = PFX(intra_pred_ang32_25_avx2);
+ p.cu[BLOCK_32x32].intra_pred[26] = PFX(intra_pred_ang32_26_avx2);
+ p.cu[BLOCK_32x32].intra_pred[27] = PFX(intra_pred_ang32_27_avx2);
+ p.cu[BLOCK_32x32].intra_pred[28] = PFX(intra_pred_ang32_28_avx2);
+ p.cu[BLOCK_32x32].intra_pred[29] = PFX(intra_pred_ang32_29_avx2);
+ p.cu[BLOCK_32x32].intra_pred[30] = PFX(intra_pred_ang32_30_avx2);
+ p.cu[BLOCK_32x32].intra_pred[31] = PFX(intra_pred_ang32_31_avx2);
+ p.cu[BLOCK_32x32].intra_pred[32] = PFX(intra_pred_ang32_32_avx2);
+ p.cu[BLOCK_32x32].intra_pred[33] = PFX(intra_pred_ang32_33_avx2);
+ p.cu[BLOCK_32x32].intra_pred[34] = PFX(intra_pred_ang32_2_avx2);
p.pu[LUMA_8x4].addAvg = PFX(addAvg_8x4_avx2);
p.pu[LUMA_8x8].addAvg = PFX(addAvg_8x8_avx2);
@@ -1461,13 +1471,14 @@ void setupAssemblyPrimitives(EncoderPrim
p.quant = PFX(quant_avx2);
p.nquant = PFX(nquant_avx2);
p.dequant_normal = PFX(dequant_normal_avx2);
+ p.dequant_scaling = PFX(dequant_scaling_avx2);
p.dst4x4 = PFX(dst4_avx2);
p.idst4x4 = PFX(idst4_avx2);
p.denoiseDct = PFX(denoise_dct_avx2);
p.scale1D_128to64 = PFX(scale1D_128to64_avx2);
p.scale2D_64to32 = PFX(scale2D_64to32_avx2);
- // p.weight_pp = PFX(weight_pp_avx2); fails tests
+ p.weight_pp = PFX(weight_pp_avx2);
p.cu[BLOCK_16x16].calcresidual = PFX(getResidual16_avx2);
p.cu[BLOCK_32x32].calcresidual = PFX(getResidual32_avx2);
@@ -1777,285 +1788,285 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hpp = PFX(interp_4tap_horiz_pp_64x64_avx2);
p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_hpp = PFX(interp_4tap_horiz_pp_48x64_avx2);
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vpp = x265_interp_4tap_vert_pp_4x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vps = x265_interp_4tap_vert_ps_4x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vsp = x265_interp_4tap_vert_sp_4x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vss = x265_interp_4tap_vert_ss_4x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vpp = x265_interp_4tap_vert_pp_8x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vps = x265_interp_4tap_vert_ps_8x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vsp = x265_interp_4tap_vert_sp_8x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vss = x265_interp_4tap_vert_ss_8x2_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vpp = x265_interp_4tap_vert_pp_8x6_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vps = x265_interp_4tap_vert_ps_8x6_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vsp = x265_interp_4tap_vert_sp_8x6_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vss = x265_interp_4tap_vert_ss_8x6_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vpp = x265_interp_4tap_vert_pp_8x12_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vps = x265_interp_4tap_vert_ps_8x12_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vsp = x265_interp_4tap_vert_sp_8x12_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vss = x265_interp_4tap_vert_ss_8x12_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
-
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vpp = x265_interp_4tap_vert_pp_4x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vps = x265_interp_4tap_vert_ps_4x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vsp = x265_interp_4tap_vert_sp_4x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vss = x265_interp_4tap_vert_ss_4x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vpp = x265_interp_4tap_vert_pp_8x64_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vps = x265_interp_4tap_vert_ps_8x64_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vsp = x265_interp_4tap_vert_sp_8x64_avx2;
- p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vss = x265_interp_4tap_vert_ss_8x64_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
- p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
-
-
- p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_vss = x265_interp_4tap_vert_ss_6x8_avx2;
- p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_vsp = x265_interp_4tap_vert_sp_6x8_avx2;
More information about the x265-commits
mailing list