[x265-commits] [x265] asm: add macro to sub_ps module to reduce code size

Sumalatha at videolan.org Sumalatha at videolan.org
Fri Apr 17 21:09:34 CEST 2015


details:   http://hg.videolan.org/x265/rev/e38791928d0d
branches:  
changeset: 10191:e38791928d0d
user:      Sumalatha Polureddy
date:      Thu Apr 16 10:36:39 2015 +0530
description:
asm: add macro to sub_ps module to reduce code size
Subject: [x265] asm: avx2 code for chroma sub_ps module, reused luma code

details:   http://hg.videolan.org/x265/rev/abd09d4a8a8c
branches:  
changeset: 10192:abd09d4a8a8c
user:      Sumalatha Polureddy
date:      Thu Apr 16 10:46:18 2015 +0530
description:
asm: avx2 code for chroma sub_ps module, reused luma code

sse4
[i422]  sub_ps[16x32]  5.50x    1386.46         7627.27
[i422]  sub_ps[32x64]  5.28x    5137.07         27110.01

avx2
[i422]  sub_ps[16x32]  9.22x    831.52          7665.70
[i422]  sub_ps[32x64]  10.59x   2581.10         27343.41
Subject: [x265] asm: ssse3 10bit code for convert_p2s[12xN],[48x64]

details:   http://hg.videolan.org/x265/rev/0ba40f2c58e6
branches:  
changeset: 10193:0ba40f2c58e6
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Wed Apr 15 19:04:04 2015 +0530
description:
asm: ssse3 10bit code for convert_p2s[12xN],[48x64]

     convert_p2s[12x16](11.37x), convert_p2s[48x64](10.50x)
Subject: [x265] asm: ssse3 10bit code for chroma_p2s[4x2],[8x2],[8x6]

details:   http://hg.videolan.org/x265/rev/6c9a7e820080
branches:  
changeset: 10194:6c9a7e820080
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Wed Apr 15 19:16:10 2015 +0530
description:
asm: ssse3 10bit code for chroma_p2s[4x2],[8x2],[8x6]

     chroma_p2s[i420][4x2](2.52x), chroma_p2s[i420][8x2](3.59x),
     chroma_p2s[i420][8x6](5.09x)
Subject: [x265] asm: sse4 10bit code for chroma_p2s[6xN] for i420, i422

details:   http://hg.videolan.org/x265/rev/0096e8730ebd
branches:  
changeset: 10195:0096e8730ebd
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 16 10:53:33 2015 +0530
description:
asm: sse4 10bit code for chroma_p2s[6xN] for i420, i422

     chroma_p2s[i420][6x8](2.89x), chroma_p2s[i422][6x16](3.37x)
Subject: [x265] asm: sse4 10bit code for chroma_p2s[2xN] for i420, i422

details:   http://hg.videolan.org/x265/rev/9736b429d394
branches:  
changeset: 10196:9736b429d394
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 16 11:24:34 2015 +0530
description:
asm: sse4 10bit code for chroma_p2s[2xN] for i420, i422

     chroma_p2s[i420][2x4](1.71x), chroma_p2s[i420][2x8](1.99x),
     chroma_p2s[i422][2x16](2.14x)
Subject: [x265] asm: sse version 10bit code for chroma_p2s, reuse luma code

details:   http://hg.videolan.org/x265/rev/c18e52fa210c
branches:  
changeset: 10197:c18e52fa210c
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 16 11:37:03 2015 +0530
description:
asm: sse version 10bit code for chroma_p2s, reuse luma code
Subject: [x265] asm: new optimized algorithm for satd, improved ~30% over previous algorithm

details:   http://hg.videolan.org/x265/rev/7be1172ec816
branches:  
changeset: 10198:7be1172ec816
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Apr 16 11:38:32 2015 +0530
description:
asm: new optimized algorithm for satd, improved ~30% over previous algorithm
Subject: [x265] Backed out changeset: 7be1172ec816

details:   http://hg.videolan.org/x265/rev/3d5a3e331652
branches:  
changeset: 10199:3d5a3e331652
user:      Steve Borho <steve at borho.org>
date:      Fri Apr 17 13:44:26 2015 -0500
description:
Backed out changeset: 7be1172ec816
Subject: [x265] asm: chroma_hps[64x64, 64x48, 64x32, 64x16] for i444 - improved 21540c->14767c, 18551c->14129c,17096c->12742c, 6216c->3923c

details:   http://hg.videolan.org/x265/rev/b30a7d159a65
branches:  
changeset: 10200:b30a7d159a65
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Thu Apr 16 14:28:41 2015 +0530
description:
asm: chroma_hps[64x64, 64x48, 64x32, 64x16] for i444 - improved 21540c->14767c, 18551c->14129c,17096c->12742c, 6216c->3923c
Subject: [x265] asm: chroma_hpp i422[4xN, 8xN, 16xN, 32xN]

details:   http://hg.videolan.org/x265/rev/338ed295d81e
branches:  
changeset: 10201:338ed295d81e
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Thu Apr 16 15:28:19 2015 +0530
description:
asm: chroma_hpp i422[4xN, 8xN, 16xN, 32xN]
Subject: [x265] asm: avx2 10bit code for convert_p2s[16xN]

details:   http://hg.videolan.org/x265/rev/7b7437578a4d
branches:  
changeset: 10202:7b7437578a4d
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 16 18:17:36 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[16xN]

     convert_p2s[16x4](10.44x), convert_p2s[16x8](15.30x),
     convert_p2s[16x12](16.55x), convert_p2s[16x16](17.48x),
     convert_p2s[16x32](17.57x), convert_p2s[16x64](20.21x)
Subject: [x265] asm: avx2 10bit code for convert_p2s[32xN],[64xN]

details:   http://hg.videolan.org/x265/rev/d70d71419691
branches:  
changeset: 10203:d70d71419691
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 16 14:18:03 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[32xN],[64xN]

     convert_p2s[32x8](15.77x), convert_p2s[32x16](19.15x),
     convert_p2s[32x24](16.84x), convert_p2s[32x32](20.49x),
     convert_p2s[32x64](21.19x), convert_p2s[64x16](18.99x),
     convert_p2s[64x32](17.16x), convert_p2s[64x48](19.27x),
     convert_p2s[64x64](16.86x)
Subject: [x265] regression: typo in rc tests

details:   http://hg.videolan.org/x265/rev/4280cd25da6e
branches:  
changeset: 10204:4280cd25da6e
user:      Mahesh Pittala <mahesh at multicorewareinc.com>
date:      Thu Apr 16 15:53:53 2015 +0530
description:
regression: typo in rc tests
Subject: [x265] rc: unix eoln for rate-control-tests.txt

details:   http://hg.videolan.org/x265/rev/0d9f2a56bccd
branches:  
changeset: 10205:0d9f2a56bccd
user:      Steve Borho <steve at borho.org>
date:      Thu Apr 16 17:12:00 2015 -0700
description:
rc: unix eoln for rate-control-tests.txt
Subject: [x265] asm: chroma_hpp[64x64, 64x48, 64x32, 64x16] for i444 - improved 22990c->14176c, 17897c->10791c, 12050c->7186c, 5655c->3266c

details:   http://hg.videolan.org/x265/rev/af03705428c3
branches:  
changeset: 10206:af03705428c3
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Fri Apr 17 10:28:18 2015 +0530
description:
asm: chroma_hpp[64x64, 64x48, 64x32, 64x16] for i444 - improved 22990c->14176c, 17897c->10791c, 12050c->7186c, 5655c->3266c
Subject: [x265] asm: avx2 10bit code for convert_p2s[24xN],[48x64]

details:   http://hg.videolan.org/x265/rev/f8376ecbfb09
branches:  
changeset: 10207:f8376ecbfb09
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Fri Apr 17 11:19:20 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[24xN],[48x64]

     convert_p2s[24x32](20.90x), convert_p2s[48x64](18.89x)
Subject: [x265] asm: avx2 code for chroma addAvg for all partitions

details:   http://hg.videolan.org/x265/rev/b91d7ed6fd1e
branches:  
changeset: 10208:b91d7ed6fd1e
user:      Sumalatha Polureddy
date:      Fri Apr 17 14:34:27 2015 +0530
description:
asm: avx2 code for chroma addAvg for all partitions
Subject: [x265] asm: avx2 10bit code for chroma_p2s[16xN],[24xN],[32xN], reuse luma code

details:   http://hg.videolan.org/x265/rev/ba491b4e3b67
branches:  
changeset: 10209:ba491b4e3b67
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Fri Apr 17 15:51:55 2015 +0530
description:
asm: avx2 10bit code for chroma_p2s[16xN],[24xN],[32xN], reuse luma code
Subject: [x265] asm: chroma_hpp[4xN, 8xN, 16xN, 32xN, 12x16, 24x32] for i444

details:   http://hg.videolan.org/x265/rev/cfb33d361b5f
branches:  
changeset: 10210:cfb33d361b5f
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Fri Apr 17 16:10:03 2015 +0530
description:
asm: chroma_hpp[4xN, 8xN, 16xN, 32xN, 12x16, 24x32] for i444
Subject: [x265] asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 2x8]

details:   http://hg.videolan.org/x265/rev/77d0418ab73f
branches:  
changeset: 10211:77d0418ab73f
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Fri Apr 17 16:43:39 2015 +0530
description:
asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 2x8]
Subject: [x265] asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 24x32] for i444

details:   http://hg.videolan.org/x265/rev/c1fd719930d1
branches:  
changeset: 10212:c1fd719930d1
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Fri Apr 17 16:56:59 2015 +0530
description:
asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 24x32] for i444
Subject: [x265] asm: intra_allangs4x4 improved by ~61% over SSE4

details:   http://hg.videolan.org/x265/rev/405ce9f2a527
branches:  
changeset: 10213:405ce9f2a527
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Fri Apr 17 16:23:28 2015 +0530
description:
asm: intra_allangs4x4 improved by ~61% over SSE4

AVX2:
intra_allangs4x4        31.17x   1070.01         33353.50

SSE4:
intra_allangs4x4        12.04x   2746.58         33061.69
Subject: [x265] asm: interp_4tap_horiz_pp_2x4_sse3

details:   http://hg.videolan.org/x265/rev/d59990f190a0
branches:  
changeset: 10214:d59990f190a0
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:04:59 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x4_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  2x4\]"
chroma_hpp[  2x4]	1.83x 	 594.91   	 1089.98

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  2x4\]"
chroma_hpp[  2x4]	1.75x 	 739.88   	 1297.40
Subject: [x265] asm: interp_4tap_horiz_pp_2x8_sse3

details:   http://hg.videolan.org/x265/rev/cdd3b34296d7
branches:  
changeset: 10215:cdd3b34296d7
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:10:24 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x8_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  2x8\]"
chroma_hpp[  2x8]	1.83x 	 1104.89  	 2026.66
chroma_hpp[  2x8]	1.84x 	 1102.43  	 2025.46

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  2x8\]"
chroma_hpp[  2x8]	1.93x 	 1252.42  	 2419.18
chroma_hpp[  2x8]	1.93x 	 1252.46  	 2417.46
Subject: [x265] asm: interp_4tap_horiz_pp_2x16_sse3

details:   http://hg.videolan.org/x265/rev/f27f6f1f2182
branches:  
changeset: 10216:f27f6f1f2182
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:15:04 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x16_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x16\]"
chroma_hpp[ 2x16]	2.35x 	 2122.49  	 4982.88

32-bit

/test/TestBench --testbench interp | grep "chroma_hpp\[ 2x16\]"
chroma_hpp[ 2x16]	2.20x 	 2262.45  	 4985.37
Subject: [x265] asm: interp_4tap_horiz_pp_4x2_sse3

details:   http://hg.videolan.org/x265/rev/f79d5d32627c
branches:  
changeset: 10217:f79d5d32627c
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:21:15 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x2_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x2\]"
chroma_hpp[  4x2]	2.09x 	 475.01   	 992.48

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x2\]"
chroma_hpp[  4x2]	2.42x 	 549.94   	 1329.94
Subject: [x265] asm: interp_4tap_horiz_pp_4x4_sse3

details:   http://hg.videolan.org/x265/rev/f8fb91b3003c
branches:  
changeset: 10218:f8fb91b3003c
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:26:26 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x4_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x4\]"
chroma_hpp[  4x4]	2.18x 	 872.49   	 1902.48
chroma_hpp[  4x4]	2.17x 	 874.99   	 1900.67

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x4\]"
chroma_hpp[  4x4]	2.59x 	 952.48   	 2462.42
chroma_hpp[  4x4]	2.59x 	 950.02   	 2462.46
Subject: [x265] asm: interp_4tap_horiz_pp_4x8_sse3

details:   http://hg.videolan.org/x265/rev/ab8fc4c59cc7
branches:  
changeset: 10219:ab8fc4c59cc7
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:33:03 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x8_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x8\]"
chroma_hpp[  4x8]	2.83x 	 1682.50  	 4759.97
chroma_hpp[  4x8]	2.83x 	 1682.50  	 4759.99

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[  4x8\]"
chroma_hpp[  4x8]	2.77x 	 1765.03  	 4892.46
chroma_hpp[  4x8]	2.77x 	 1765.00  	 4892.48
Subject: [x265] asm: interp_4tap_horiz_pp_4x16_sse3

details:   http://hg.videolan.org/x265/rev/6b7f853e3a7f
branches:  
changeset: 10220:6b7f853e3a7f
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:36:54 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x16_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x16\]"
chroma_hpp[ 4x16]	2.84x 	 3302.49  	 9392.47
chroma_hpp[ 4x16]	2.84x 	 3302.50  	 9392.49

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x16\]"
chroma_hpp[ 4x16]	2.81x 	 3380.07  	 9499.97
chroma_hpp[ 4x16]	2.81x 	 3382.50  	 9499.97
Subject: [x265] asm: interp_4tap_horiz_pp_4x32_sse3

details:   http://hg.videolan.org/x265/rev/45610a24b399
branches:  
changeset: 10221:45610a24b399
user:      David T Yuen <dtyx265 at gmail.com>
date:      Fri Apr 17 09:40:06 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x32_sse3

This replaces c code.

64-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x32\]"
chroma_hpp[ 4x32]	2.80x 	 6552.49  	 18351.63

32-bit

./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x32\]"
chroma_hpp[ 4x32]	2.79x 	 6625.03  	 18459.97

diffstat:

 source/common/x86/asm-primitives.cpp     |  244 +++++++++-
 source/common/x86/intrapred.h            |    1 +
 source/common/x86/intrapred8_allangs.asm |  376 +++++++++++++++
 source/common/x86/ipfilter16.asm         |  760 +++++++++++++++++++++++++++++++
 source/common/x86/ipfilter8.asm          |  481 +++++++++++++++++++
 source/common/x86/ipfilter8.h            |   87 +++-
 source/common/x86/mc-a.asm               |    6 +
 source/common/x86/pixel-util8.asm        |   18 +-
 source/common/x86/pixel.h                |    2 +
 source/test/rate-control-tests.txt       |   72 +-
 10 files changed, 2003 insertions(+), 44 deletions(-)

diffs (truncated from 2387 to 300 lines):

diff -r f9c0e1f233cc -r 45610a24b399 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Wed Apr 15 16:20:27 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp	Fri Apr 17 09:40:06 2015 -0700
@@ -972,6 +972,49 @@ void setupAssemblyPrimitives(EncoderPrim
         p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_ssse3;
         p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_ssse3;
         p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_ssse3;
+        p.pu[LUMA_12x16].convert_p2s = x265_filterPixelToShort_12x16_ssse3;
+        p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_ssse3;
+
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].p2s = x265_filterPixelToShort_16x4_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].p2s = x265_filterPixelToShort_16x12_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].p2s = x265_filterPixelToShort_4x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].p2s = x265_filterPixelToShort_8x12_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].p2s = x265_filterPixelToShort_8x64_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].p2s = x265_filterPixelToShort_12x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].p2s = x265_filterPixelToShort_16x24_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].p2s = x265_filterPixelToShort_16x64_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_ssse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].p2s = x265_filterPixelToShort_4x2_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].p2s = x265_filterPixelToShort_8x2_ssse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].p2s = x265_filterPixelToShort_8x6_ssse3;
     }
     if (cpuMask & X265_CPU_SSE4)
     {
@@ -1011,6 +1054,13 @@ void setupAssemblyPrimitives(EncoderPrim
         ALL_LUMA_TU_S(copy_cnt, copy_cnt_, sse4);
         ALL_LUMA_CU(psy_cost_pp, psyCost_pp, sse4);
         ALL_LUMA_CU(psy_cost_ss, psyCost_ss, sse4);
+
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].p2s = x265_filterPixelToShort_2x4_sse4;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].p2s = x265_filterPixelToShort_6x8_sse4;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].p2s = x265_filterPixelToShort_2x16_sse4;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].p2s = x265_filterPixelToShort_6x16_sse4;
     }
     if (cpuMask & X265_CPU_AVX)
     {
@@ -1173,6 +1223,45 @@ void setupAssemblyPrimitives(EncoderPrim
         ALL_LUMA_PU(luma_vps, interp_8tap_vert_ps, avx2);
         ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);
         ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);
+
+        p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_avx2;
+        p.pu[LUMA_16x8].convert_p2s = x265_filterPixelToShort_16x8_avx2;
+        p.pu[LUMA_16x12].convert_p2s = x265_filterPixelToShort_16x12_avx2;
+        p.pu[LUMA_16x16].convert_p2s = x265_filterPixelToShort_16x16_avx2;
+        p.pu[LUMA_16x32].convert_p2s = x265_filterPixelToShort_16x32_avx2;
+        p.pu[LUMA_16x64].convert_p2s = x265_filterPixelToShort_16x64_avx2;
+        p.pu[LUMA_32x8].convert_p2s = x265_filterPixelToShort_32x8_avx2;
+        p.pu[LUMA_32x16].convert_p2s = x265_filterPixelToShort_32x16_avx2;
+        p.pu[LUMA_32x24].convert_p2s = x265_filterPixelToShort_32x24_avx2;
+        p.pu[LUMA_32x32].convert_p2s = x265_filterPixelToShort_32x32_avx2;
+        p.pu[LUMA_32x64].convert_p2s = x265_filterPixelToShort_32x64_avx2;
+        p.pu[LUMA_64x16].convert_p2s = x265_filterPixelToShort_64x16_avx2;
+        p.pu[LUMA_64x32].convert_p2s = x265_filterPixelToShort_64x32_avx2;
+        p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_avx2;
+        p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_avx2;
+        p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_avx2;
+        p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_avx2;
+
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].p2s = x265_filterPixelToShort_16x4_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].p2s = x265_filterPixelToShort_16x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].p2s = x265_filterPixelToShort_16x12_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].p2s = x265_filterPixelToShort_16x16_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].p2s = x265_filterPixelToShort_16x32_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].p2s = x265_filterPixelToShort_24x32_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].p2s = x265_filterPixelToShort_16x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].p2s = x265_filterPixelToShort_16x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].p2s = x265_filterPixelToShort_16x24_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].p2s = x265_filterPixelToShort_16x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].p2s = x265_filterPixelToShort_16x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_avx2;
     }
 }
 #else // if HIGH_BIT_DEPTH
@@ -1304,6 +1393,21 @@ void setupAssemblyPrimitives(EncoderPrim
 
         p.planecopy_sp = x265_downShift_16_sse2;
     }
+    if (cpuMask & X265_CPU_SSE3)
+    {
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_hpp = x265_interp_4tap_horiz_pp_2x4_sse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_sse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_hpp = x265_interp_4tap_horiz_pp_4x2_sse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_sse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_sse3;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_sse3;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_hpp = x265_interp_4tap_horiz_pp_4x32_sse3;
+    }
     if (cpuMask & X265_CPU_SSSE3)
     {
         p.pu[LUMA_8x16].sad_x3 = x265_pixel_sad_x3_8x16_ssse3;
@@ -1646,20 +1750,35 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].addAvg = x265_addAvg_8x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].addAvg = x265_addAvg_8x16_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].addAvg = x265_addAvg_8x32_avx2;
-
         p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].addAvg = x265_addAvg_12x16_avx2;
-
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].addAvg = x265_addAvg_16x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].addAvg = x265_addAvg_16x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].addAvg = x265_addAvg_16x12_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].addAvg = x265_addAvg_16x16_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].addAvg = x265_addAvg_16x32_avx2;
-
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].addAvg = x265_addAvg_32x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].addAvg = x265_addAvg_32x16_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].addAvg = x265_addAvg_32x24_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].addAvg = x265_addAvg_32x32_avx2;
 
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].addAvg = x265_addAvg_8x4_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].addAvg = x265_addAvg_8x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].addAvg = x265_addAvg_8x12_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].addAvg = x265_addAvg_8x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].addAvg = x265_addAvg_8x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].addAvg = x265_addAvg_8x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].addAvg = x265_addAvg_12x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].addAvg = x265_addAvg_16x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].addAvg = x265_addAvg_16x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].addAvg = x265_addAvg_16x24_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].addAvg = x265_addAvg_16x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].addAvg = x265_addAvg_16x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].addAvg = x265_addAvg_24x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].addAvg = x265_addAvg_32x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].addAvg = x265_addAvg_32x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].addAvg = x265_addAvg_32x48_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].addAvg = x265_addAvg_32x64_avx2;
+
         p.cu[BLOCK_16x16].add_ps = x265_pixel_add_ps_16x16_avx2;
         p.cu[BLOCK_32x32].add_ps = x265_pixel_add_ps_32x32_avx2;
         p.cu[BLOCK_64x64].add_ps = x265_pixel_add_ps_64x64_avx2;
@@ -1673,6 +1792,8 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_64x64].sub_ps = x265_pixel_sub_ps_64x64_avx2;
         p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].sub_ps = x265_pixel_sub_ps_16x16_avx2;
         p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].sub_ps = x265_pixel_sub_ps_32x32_avx2;
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].sub_ps = x265_pixel_sub_ps_16x32_avx2;
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].sub_ps = x265_pixel_sub_ps_32x64_avx2;
 
         p.pu[LUMA_16x4].pixelavg_pp = x265_pixel_avg_16x4_avx2;
         p.pu[LUMA_16x8].pixelavg_pp = x265_pixel_avg_16x8_avx2;
@@ -1857,6 +1978,9 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_32x32].intra_pred[21] = x265_intra_pred_ang32_21_avx2;
         p.cu[BLOCK_32x32].intra_pred[18] = x265_intra_pred_ang32_18_avx2;
 
+        // all_angs primitives
+        p.cu[BLOCK_4x4].intra_pred_allangs = x265_all_angs_pred_4x4_avx2;
+
         // copy_sp primitives
         p.cu[BLOCK_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
         p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
@@ -2134,6 +2258,120 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].filter_hpp = x265_interp_4tap_horiz_pp_24x64_avx2;
         p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_avx2;
 
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_avx2;
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_avx2;
+        
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_hpp = x265_interp_4tap_horiz_pp_8x4_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_hpp = x265_interp_4tap_horiz_pp_8x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_hpp = x265_interp_4tap_horiz_pp_8x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_hpp = x265_interp_4tap_horiz_pp_8x12_avx2;
+        
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].filter_hpp = x265_interp_4tap_horiz_pp_16x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].filter_hpp = x265_interp_4tap_horiz_pp_16x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].filter_hpp = x265_interp_4tap_horiz_pp_16x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_hpp = x265_interp_4tap_horiz_pp_16x24_avx2;
+        
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].filter_hpp = x265_interp_4tap_horiz_pp_32x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_hpp = x265_interp_4tap_horiz_pp_32x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_hpp = x265_interp_4tap_horiz_pp_32x48_avx2;
+        
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_avx2;
+
+        //i444 filters hpp
+
+        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_hpp = x265_interp_4tap_horiz_pp_8x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_hpp = x265_interp_4tap_horiz_pp_8x4_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_16x8].filter_hpp = x265_interp_4tap_horiz_pp_16x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x32].filter_hpp = x265_interp_4tap_horiz_pp_16x32_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x12].filter_hpp = x265_interp_4tap_horiz_pp_16x12_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x4].filter_hpp = x265_interp_4tap_horiz_pp_16x4_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x64].filter_hpp = x265_interp_4tap_horiz_pp_16x64_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_12x16].filter_hpp = x265_interp_4tap_horiz_pp_12x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_24x32].filter_hpp = x265_interp_4tap_horiz_pp_24x32_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_32x16].filter_hpp = x265_interp_4tap_horiz_pp_32x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_32x64].filter_hpp = x265_interp_4tap_horiz_pp_32x64_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_32x24].filter_hpp = x265_interp_4tap_horiz_pp_32x24_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_32x8].filter_hpp = x265_interp_4tap_horiz_pp_32x8_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hpp = x265_interp_4tap_horiz_pp_64x64_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_hpp = x265_interp_4tap_horiz_pp_64x32_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_hpp = x265_interp_4tap_horiz_pp_64x48_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_hpp = x265_interp_4tap_horiz_pp_64x16_avx2;
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_hps = x265_interp_4tap_horiz_ps_8x4_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_hps = x265_interp_4tap_horiz_ps_8x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_hps = x265_interp_4tap_horiz_ps_8x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_hps = x265_interp_4tap_horiz_ps_8x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_hps = x265_interp_4tap_horiz_ps_8x64_avx2; //adding macro call
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_hps = x265_interp_4tap_horiz_ps_8x12_avx2; //adding macro call
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].filter_hps = x265_interp_4tap_horiz_ps_16x64_avx2;//adding macro call
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_hps = x265_interp_4tap_horiz_ps_16x24_avx2;//adding macro call
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].filter_hps = x265_interp_4tap_horiz_ps_32x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_hps = x265_interp_4tap_horiz_ps_32x64_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_hps = x265_interp_4tap_horiz_ps_32x48_avx2;
+
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hps = x265_interp_4tap_horiz_ps_2x8_avx2;
+
+        //i444 chroma_hps
+        p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_hps = x265_interp_4tap_horiz_ps_64x32_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_hps = x265_interp_4tap_horiz_ps_64x48_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_hps = x265_interp_4tap_horiz_ps_64x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hps = x265_interp_4tap_horiz_ps_64x64_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_hps = x265_interp_4tap_horiz_ps_8x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_hps = x265_interp_4tap_horiz_ps_8x4_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_hps = x265_interp_4tap_horiz_ps_8x16_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_hps = x265_interp_4tap_horiz_ps_8x32_avx2;
+
+        p.chroma[X265_CSP_I444].pu[LUMA_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
+        p.chroma[X265_CSP_I444].pu[LUMA_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;


More information about the x265-commits mailing list