[x265-commits] [x265] fix issue #144 10-bit x265 hangs from 1.7+170-4948aeae8a1...

Dnyaneshwar G dnyaneshwar at multicorewareinc.com
Fri Jun 19 00:41:49 CEST 2015


details:   http://hg.videolan.org/x265/rev/d8f12802279d
branches:  
changeset: 10650:d8f12802279d
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Wed Jun 17 14:25:32 2015 +0530
description:
fix issue #144 10-bit x265 hangs from 1.7+170-4948aeae8a18 on Win7 64-bit
Subject: [x265] analysis-mode: fix blocking artifacts in analysis-mode load/save

details:   http://hg.videolan.org/x265/rev/2dd7e396b3f9
branches:  
changeset: 10651:2dd7e396b3f9
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Mon Jun 15 14:32:22 2015 +0530
description:
analysis-mode: fix blocking artifacts in analysis-mode load/save

With analysis-data dumps enabled, blocking artifacts were noticed with merge
candidates. The merge candidate should be used only after choosing the best of
skip and merge with residual.
Subject: [x265] fix issue #143 x265 is slow when it is build with GCC 5.1

details:   http://hg.videolan.org/x265/rev/98325f22a1ba
branches:  stable
changeset: 10652:98325f22a1ba
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Jun 17 22:14:14 2015 +0530
description:
fix issue #143 x265 is slow when it is build with GCC 5.1
Subject: [x265] analysis-mode: fix blocking artifacts in analysis-mode load/save

details:   http://hg.videolan.org/x265/rev/1b87881db758
branches:  stable
changeset: 10653:1b87881db758
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Jun 17 22:15:41 2015 +0530
description:
analysis-mode: fix blocking artifacts in analysis-mode load/save

With analysis-data dumps enabled, blocking artifacts were noticed with merge
candidates. The merge candidate should be used only after choosing the best of
skip and merge with residual.
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/d6c32960b5df
branches:  
changeset: 10654:d6c32960b5df
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Jun 17 22:16:03 2015 +0530
description:
Merge with stable
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 13 & 23

details:   http://hg.videolan.org/x265/rev/7b03df434b5d
branches:  
changeset: 10655:7b03df434b5d
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 15:56:48 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 13 & 23

performance improvement over SSE:
intra_ang_32x32[13]    7996c->4784c, 40%
intra_ang_32x32[23]    5797c->2990c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 14 & 22

details:   http://hg.videolan.org/x265/rev/88474e625dfb
branches:  
changeset: 10656:88474e625dfb
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 15:58:24 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 14 & 22

performance improvement over SSE:
intra_ang_32x32[14]    7997c->4722c, 40%
intra_ang_32x32[22]    5810c->3230c, 44%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 15 & 21

details:   http://hg.videolan.org/x265/rev/10690ad2c3d8
branches:  
changeset: 10657:10690ad2c3d8
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 15:58:50 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 15 & 21

performance improvement over SSE:
intra_ang_32x32[15]    8337c->4609c, 44%
intra_ang_32x32[21]    6303c->3238c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 16 & 20

details:   http://hg.videolan.org/x265/rev/dc09f8816a15
branches:  
changeset: 10658:dc09f8816a15
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 15:59:53 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 16 & 20

performance improvement over SSE:
intra_ang_32x32[16]    8032c->4841c, 40%
intra_ang_32x32[20]    6171c->3277c, 47%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 17 & 19

details:   http://hg.videolan.org/x265/rev/b495804e003f
branches:  
changeset: 10659:b495804e003f
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 16:00:25 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 17 & 19

performance improvement over SSE:
intra_ang_32x32[17]    8392c->4757c, 43%
intra_ang_32x32[19]    6122c->3173c, 48%
Subject: [x265] asm: 10bpp avx2 code for intra_pred_ang32x32 mode 18, improved 1331c->884c, 31%

details:   http://hg.videolan.org/x265/rev/69c5275261f2
branches:  
changeset: 10660:69c5275261f2
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Tue Jun 16 14:10:06 2015 +0530
description:
asm: 10bpp avx2 code for intra_pred_ang32x32 mode 18, improved 1331c->884c, 31%
Subject: [x265] clean up debug code in codeCoeffNxN()

details:   http://hg.videolan.org/x265/rev/bbb6f4573dab
branches:  
changeset: 10661:bbb6f4573dab
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jun 16 09:45:57 2015 -0700
description:
clean up debug code in codeCoeffNxN()
Subject: [x265] faster algorithm to calculate signHidden cost in codeCoeffNxN()

details:   http://hg.videolan.org/x265/rev/80a1a697e993
branches:  
changeset: 10662:80a1a697e993
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jun 16 15:54:03 2015 -0700
description:
faster algorithm to calculate signHidden cost in codeCoeffNxN()
Subject: [x265] improve by convert arithmetic(signed) shift to logic(unsigned) shift

details:   http://hg.videolan.org/x265/rev/11f818d6465c
branches:  
changeset: 10663:11f818d6465c
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jun 16 15:54:09 2015 -0700
description:
improve by convert arithmetic(signed) shift to logic(unsigned) shift
Subject: [x265] reduce VC condition branch by modify code style

details:   http://hg.videolan.org/x265/rev/103f09e46d32
branches:  
changeset: 10664:103f09e46d32
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jun 16 15:54:22 2015 -0700
description:
reduce VC condition branch by modify code style
Subject: [x265] asm: avx2 code for weight_pp() for 10 bpp

details:   http://hg.videolan.org/x265/rev/9482a929901c
branches:  
changeset: 10665:9482a929901c
user:      Sumalatha Polureddy<sumalatha at multicorewareinc.com>
date:      Wed Jun 17 14:58:01 2015 +0530
description:
asm: avx2 code for weight_pp() for 10 bpp

sse4
weight_pp  9.37x    6768.87         63435.43

avx2
weight_pp  16.45x   4187.86         68871.50
Subject: [x265] improve fillReferenceSamples by reduce condition operators in loop

details:   http://hg.videolan.org/x265/rev/404788909650
branches:  
changeset: 10666:404788909650
user:      Min Chen <chenm003 at 163.com>
date:      Wed Jun 17 15:00:19 2015 -0700
description:
improve fillReferenceSamples by reduce condition operators in loop
Subject: [x265] asm: dequant_scaling asm code, improved 12668c->11097c, 12% over intrinsic

details:   http://hg.videolan.org/x265/rev/65cf14a3eeb1
branches:  
changeset: 10667:65cf14a3eeb1
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Wed Jun 17 17:45:35 2015 +0530
description:
asm: dequant_scaling asm code, improved 12668c->11097c, 12% over intrinsic
Subject: [x265] asm: avx2 code for dequant_scaling, improved 11097c->6860c, 38% over SSE4

details:   http://hg.videolan.org/x265/rev/6a223bb5b783
branches:  
changeset: 10668:6a223bb5b783
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 18 10:11:57 2015 +0530
description:
asm: avx2 code for dequant_scaling, improved 11097c->6860c, 38% over SSE4
Subject: [x265] doc: update strong intra smoothing explanation

details:   http://hg.videolan.org/x265/rev/cdbfc7d0b067
branches:  stable
changeset: 10669:cdbfc7d0b067
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jun 18 15:49:48 2015 +0530
description:
doc: update strong intra smoothing explanation
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/3ccd7658b374
branches:  
changeset: 10670:3ccd7658b374
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jun 18 15:54:42 2015 +0530
description:
Merge with stable
Subject: [x265] build: fix exec bit of shell scripts

details:   http://hg.videolan.org/x265/rev/29ce75aa8879
branches:  
changeset: 10671:29ce75aa8879
user:      Steve Borho <steve at borho.org>
date:      Thu Jun 18 15:28:36 2015 -0500
description:
build: fix exec bit of shell scripts
Subject: [x265] asm: fix multilib link

details:   http://hg.videolan.org/x265/rev/c63f0a1cdac2
branches:  
changeset: 10672:c63f0a1cdac2
user:      Steve Borho <steve at borho.org>
date:      Thu Jun 18 15:36:15 2015 -0500
description:
asm: fix multilib link
Subject: [x265] cli: fix multilib link

details:   http://hg.videolan.org/x265/rev/f17684cb7010
branches:  
changeset: 10673:f17684cb7010
user:      Steve Borho <steve at borho.org>
date:      Thu Jun 18 15:42:03 2015 -0500
description:
cli: fix multilib link
Subject: [x265] asm: remove useless comments

details:   http://hg.videolan.org/x265/rev/1c6de5ac3883
branches:  
changeset: 10674:1c6de5ac3883
user:      Steve Borho <steve at borho.org>
date:      Thu Jun 18 15:29:11 2015 -0500
description:
asm: remove useless comments

diffstat:

 doc/reST/cli.rst                     |     6 +-
 source/common/predict.cpp            |    23 +-
 source/common/quant.h                |    12 +-
 source/common/x86/asm-primitives.cpp |   636 ++++----
 source/common/x86/intrapred16.asm    |  2321 ++++++++++++++++++++++++++++++++++
 source/common/x86/pixel-util.h       |     1 +
 source/common/x86/pixel-util8.asm    |   184 ++-
 source/encoder/analysis.cpp          |   139 +-
 source/encoder/analysis.h            |     2 +-
 source/encoder/entropy.cpp           |    21 +-
 source/x265.cpp                      |     2 +-
 11 files changed, 2919 insertions(+), 428 deletions(-)

diffs (truncated from 3618 to 300 lines):

diff -r be0ed447922c -r 1c6de5ac3883 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Tue Jun 16 11:15:03 2015 +0530
+++ b/doc/reST/cli.rst	Thu Jun 18 15:29:11 2015 -0500
@@ -920,7 +920,11 @@ Spatial/intra options
 
 .. option:: --strong-intra-smoothing, --no-strong-intra-smoothing
 
-	Enable strong intra smoothing for 32x32 intra blocks. Default enabled
+	Enable strong intra smoothing for 32x32 intra blocks. This flag 
+	performs bi-linear interpolation of the corner reference samples 
+	for a strong smoothing effect. The purpose is to prevent blocking 
+	or banding artifacts in regions with few/zero AC coefficients. 
+	Default enabled
 
 .. option:: --constrained-intra, --no-constrained-intra
 
diff -r be0ed447922c -r 1c6de5ac3883 source/common/predict.cpp
--- a/source/common/predict.cpp	Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/predict.cpp	Thu Jun 18 15:29:11 2015 -0500
@@ -776,30 +776,17 @@ void Predict::fillReferenceSamples(const
         // Fill left & below-left samples
         adiTemp += picStride;
         adi--;
-        pNeighborFlags--;
-        for (int j = 0; j < leftUnits; j++)
+        // NOTE: over copy here, but reduce condition operators
+        for (int j = 0; j < leftUnits * unitHeight; j++)
         {
-            if (*pNeighborFlags)
-                for (int i = 0; i < unitHeight; i++)
-                    adi[-i] = adiTemp[i * picStride];
-
-            adiTemp += unitHeight * picStride;
-            adi -= unitHeight;
-            pNeighborFlags--;
+            adi[-j] = adiTemp[j * picStride];
         }
 
         // Fill above & above-right samples
         adiTemp = adiOrigin - picStride;
         adi = adiLineBuffer + (leftUnits * unitHeight) + unitWidth;
-        pNeighborFlags = bNeighborFlags + leftUnits + 1;
-        for (int j = 0; j < aboveUnits; j++)
-        {
-            if (*pNeighborFlags)
-                memcpy(adi, adiTemp, unitWidth * sizeof(*adiTemp));
-            adiTemp += unitWidth;
-            adi += unitWidth;
-            pNeighborFlags++;
-        }
+        // NOTE: over copy here, but reduce condition operators
+        memcpy(adi, adiTemp, aboveUnits * unitWidth * sizeof(*adiTemp));
 
         // Pad reference samples when necessary
         int curr = 0;
diff -r be0ed447922c -r 1c6de5ac3883 source/common/quant.h
--- a/source/common/quant.h	Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/quant.h	Thu Jun 18 15:29:11 2015 -0500
@@ -126,9 +126,9 @@ public:
         const uint32_t sigPos = (uint32_t)(sigCoeffGroupFlag64 >> (cgBlkPos + 1)); // just need lowest 7-bits valid
 
         // TODO: instruction BT is faster, but _bittest64 still generate instruction 'BT m, r' in VS2012
-        const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & (sigPos & 1);
-        const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 2)) & 2;
-        return sigRight + sigLower;
+        const uint32_t sigRight = ((uint32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
+        const uint32_t sigLower = ((uint32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
+        return sigRight + sigLower * 2;
     }
 
     /* Context derivation process of coeff_abs_significant_flag */
@@ -137,10 +137,10 @@ public:
         X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
         // NOTE: unsafe shift operator, see NOTE in calcPatternSigCtx
         const uint32_t sigPos = (uint32_t)(cgGroupMask >> (cgBlkPos + 1)); // just need lowest 8-bits valid
-        const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
-        const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
+        const uint32_t sigRight = ((uint32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
+        const uint32_t sigLower = ((uint32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
 
-        return (sigRight | sigLower) & 1;
+        return (sigRight | sigLower);
     }
 
     /* static methods shared with entropy.cpp */
diff -r be0ed447922c -r 1c6de5ac3883 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Tue Jun 16 11:15:03 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp	Thu Jun 18 15:29:11 2015 -0500
@@ -1041,8 +1041,6 @@ void setupAssemblyPrimitives(EncoderPrim
         p.dst4x4 = PFX(dst4_ssse3);
         p.cu[BLOCK_8x8].idct = PFX(idct8_ssse3);
 
-        ALL_LUMA_TU(count_nonzero, count_nonzero, avx2);
-
         p.frameInitLowres = PFX(frame_init_lowres_core_ssse3);
 
         ALL_LUMA_PU(convert_p2s, filterPixelToShort, ssse3);
@@ -1106,6 +1104,7 @@ void setupAssemblyPrimitives(EncoderPrim
         p.quant = PFX(quant_sse4);
         p.nquant = PFX(nquant_sse4);
         p.dequant_normal = PFX(dequant_normal_sse4);
+        p.dequant_scaling = PFX(dequant_scaling_sse4);
 
         // p.pu[LUMA_4x4].satd = p.cu[BLOCK_4x4].sa8d = PFX(pixel_satd_4x4_sse4); fails tests
         ALL_LUMA_PU(satd, pixel_satd, sse4);
@@ -1310,28 +1309,39 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_16x16].intra_pred[33]    = PFX(intra_pred_ang16_33_avx2);
         p.cu[BLOCK_16x16].intra_pred[34]    = PFX(intra_pred_ang16_2_avx2);
 
-        p.cu[BLOCK_32x32].intra_pred[2]     = x265_intra_pred_ang32_2_avx2;
-        p.cu[BLOCK_32x32].intra_pred[3]     = x265_intra_pred_ang32_3_avx2;
-        p.cu[BLOCK_32x32].intra_pred[4]     = x265_intra_pred_ang32_4_avx2;
-        p.cu[BLOCK_32x32].intra_pred[5]     = x265_intra_pred_ang32_5_avx2;
-        p.cu[BLOCK_32x32].intra_pred[6]     = x265_intra_pred_ang32_6_avx2;
-        p.cu[BLOCK_32x32].intra_pred[7]     = x265_intra_pred_ang32_7_avx2;
-        p.cu[BLOCK_32x32].intra_pred[8]     = x265_intra_pred_ang32_8_avx2;
-        p.cu[BLOCK_32x32].intra_pred[9]     = x265_intra_pred_ang32_9_avx2;
-        p.cu[BLOCK_32x32].intra_pred[10]    = x265_intra_pred_ang32_10_avx2;
-        p.cu[BLOCK_32x32].intra_pred[11]    = x265_intra_pred_ang32_11_avx2;
-        p.cu[BLOCK_32x32].intra_pred[12]    = x265_intra_pred_ang32_12_avx2;
-        p.cu[BLOCK_32x32].intra_pred[24]    = x265_intra_pred_ang32_24_avx2;
-        p.cu[BLOCK_32x32].intra_pred[25]    = x265_intra_pred_ang32_25_avx2;
-        p.cu[BLOCK_32x32].intra_pred[26]    = x265_intra_pred_ang32_26_avx2;
-        p.cu[BLOCK_32x32].intra_pred[27]    = x265_intra_pred_ang32_27_avx2;
-        p.cu[BLOCK_32x32].intra_pred[28]    = x265_intra_pred_ang32_28_avx2;
-        p.cu[BLOCK_32x32].intra_pred[29]    = x265_intra_pred_ang32_29_avx2;
-        p.cu[BLOCK_32x32].intra_pred[30]    = x265_intra_pred_ang32_30_avx2;
-        p.cu[BLOCK_32x32].intra_pred[31]    = x265_intra_pred_ang32_31_avx2;
-        p.cu[BLOCK_32x32].intra_pred[32]    = x265_intra_pred_ang32_32_avx2;
-        p.cu[BLOCK_32x32].intra_pred[33]    = x265_intra_pred_ang32_33_avx2;
-        p.cu[BLOCK_32x32].intra_pred[34]    = x265_intra_pred_ang32_2_avx2;
+        p.cu[BLOCK_32x32].intra_pred[2]     = PFX(intra_pred_ang32_2_avx2);
+        p.cu[BLOCK_32x32].intra_pred[3]     = PFX(intra_pred_ang32_3_avx2);
+        p.cu[BLOCK_32x32].intra_pred[4]     = PFX(intra_pred_ang32_4_avx2);
+        p.cu[BLOCK_32x32].intra_pred[5]     = PFX(intra_pred_ang32_5_avx2);
+        p.cu[BLOCK_32x32].intra_pred[6]     = PFX(intra_pred_ang32_6_avx2);
+        p.cu[BLOCK_32x32].intra_pred[7]     = PFX(intra_pred_ang32_7_avx2);
+        p.cu[BLOCK_32x32].intra_pred[8]     = PFX(intra_pred_ang32_8_avx2);
+        p.cu[BLOCK_32x32].intra_pred[9]     = PFX(intra_pred_ang32_9_avx2);
+        p.cu[BLOCK_32x32].intra_pred[10]    = PFX(intra_pred_ang32_10_avx2);
+        p.cu[BLOCK_32x32].intra_pred[11]    = PFX(intra_pred_ang32_11_avx2);
+        p.cu[BLOCK_32x32].intra_pred[12]    = PFX(intra_pred_ang32_12_avx2);
+        p.cu[BLOCK_32x32].intra_pred[13]    = PFX(intra_pred_ang32_13_avx2);
+        p.cu[BLOCK_32x32].intra_pred[14]    = PFX(intra_pred_ang32_14_avx2);
+        p.cu[BLOCK_32x32].intra_pred[15]    = PFX(intra_pred_ang32_15_avx2);
+        p.cu[BLOCK_32x32].intra_pred[16]    = PFX(intra_pred_ang32_16_avx2);
+        p.cu[BLOCK_32x32].intra_pred[17]    = PFX(intra_pred_ang32_17_avx2);
+        p.cu[BLOCK_32x32].intra_pred[18]    = PFX(intra_pred_ang32_18_avx2);
+        p.cu[BLOCK_32x32].intra_pred[19]    = PFX(intra_pred_ang32_19_avx2);
+        p.cu[BLOCK_32x32].intra_pred[20]    = PFX(intra_pred_ang32_20_avx2);
+        p.cu[BLOCK_32x32].intra_pred[21]    = PFX(intra_pred_ang32_21_avx2);
+        p.cu[BLOCK_32x32].intra_pred[22]    = PFX(intra_pred_ang32_22_avx2);
+        p.cu[BLOCK_32x32].intra_pred[23]    = PFX(intra_pred_ang32_23_avx2);
+        p.cu[BLOCK_32x32].intra_pred[24]    = PFX(intra_pred_ang32_24_avx2);
+        p.cu[BLOCK_32x32].intra_pred[25]    = PFX(intra_pred_ang32_25_avx2);
+        p.cu[BLOCK_32x32].intra_pred[26]    = PFX(intra_pred_ang32_26_avx2);
+        p.cu[BLOCK_32x32].intra_pred[27]    = PFX(intra_pred_ang32_27_avx2);
+        p.cu[BLOCK_32x32].intra_pred[28]    = PFX(intra_pred_ang32_28_avx2);
+        p.cu[BLOCK_32x32].intra_pred[29]    = PFX(intra_pred_ang32_29_avx2);
+        p.cu[BLOCK_32x32].intra_pred[30]    = PFX(intra_pred_ang32_30_avx2);
+        p.cu[BLOCK_32x32].intra_pred[31]    = PFX(intra_pred_ang32_31_avx2);
+        p.cu[BLOCK_32x32].intra_pred[32]    = PFX(intra_pred_ang32_32_avx2);
+        p.cu[BLOCK_32x32].intra_pred[33]    = PFX(intra_pred_ang32_33_avx2);
+        p.cu[BLOCK_32x32].intra_pred[34]    = PFX(intra_pred_ang32_2_avx2);
 
         p.pu[LUMA_8x4].addAvg   = PFX(addAvg_8x4_avx2);
         p.pu[LUMA_8x8].addAvg   = PFX(addAvg_8x8_avx2);
@@ -1461,13 +1471,14 @@ void setupAssemblyPrimitives(EncoderPrim
         p.quant = PFX(quant_avx2);
         p.nquant = PFX(nquant_avx2);
         p.dequant_normal  = PFX(dequant_normal_avx2);
+        p.dequant_scaling = PFX(dequant_scaling_avx2);
         p.dst4x4 = PFX(dst4_avx2);
         p.idst4x4 = PFX(idst4_avx2);
         p.denoiseDct = PFX(denoise_dct_avx2);
 
         p.scale1D_128to64 = PFX(scale1D_128to64_avx2);
         p.scale2D_64to32 = PFX(scale2D_64to32_avx2);
-        // p.weight_pp = PFX(weight_pp_avx2); fails tests
+        p.weight_pp = PFX(weight_pp_avx2);
 
         p.cu[BLOCK_16x16].calcresidual = PFX(getResidual16_avx2);
         p.cu[BLOCK_32x32].calcresidual = PFX(getResidual32_avx2);
@@ -1777,285 +1788,285 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hpp = PFX(interp_4tap_horiz_pp_64x64_avx2);
         p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_hpp = PFX(interp_4tap_horiz_pp_48x64_avx2);
 
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vpp = x265_interp_4tap_vert_pp_4x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vps = x265_interp_4tap_vert_ps_4x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vsp = x265_interp_4tap_vert_sp_4x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vss = x265_interp_4tap_vert_ss_4x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vpp = x265_interp_4tap_vert_pp_8x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vps = x265_interp_4tap_vert_ps_8x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vsp = x265_interp_4tap_vert_sp_8x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vss = x265_interp_4tap_vert_ss_8x2_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vpp = x265_interp_4tap_vert_pp_8x6_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vps = x265_interp_4tap_vert_ps_8x6_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vsp = x265_interp_4tap_vert_sp_8x6_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vss = x265_interp_4tap_vert_ss_8x6_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vpp = x265_interp_4tap_vert_pp_8x12_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vps = x265_interp_4tap_vert_ps_8x12_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vsp = x265_interp_4tap_vert_sp_8x12_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vss = x265_interp_4tap_vert_ss_8x12_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
-
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vpp = x265_interp_4tap_vert_pp_4x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vps = x265_interp_4tap_vert_ps_4x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vsp = x265_interp_4tap_vert_sp_4x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vss = x265_interp_4tap_vert_ss_4x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vpp = x265_interp_4tap_vert_pp_8x64_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vps = x265_interp_4tap_vert_ps_8x64_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vsp = x265_interp_4tap_vert_sp_8x64_avx2;
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vss = x265_interp_4tap_vert_ss_8x64_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vsp = x265_interp_4tap_vert_sp_8x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_vss = x265_interp_4tap_vert_ss_8x4_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vsp = x265_interp_4tap_vert_sp_8x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_vss = x265_interp_4tap_vert_ss_8x16_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vsp = x265_interp_4tap_vert_sp_8x32_avx2;
-        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_vss = x265_interp_4tap_vert_ss_8x32_avx2;
-
-
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_vss = x265_interp_4tap_vert_ss_6x8_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].filter_vsp = x265_interp_4tap_vert_sp_6x8_avx2;


More information about the x265-commits mailing list