[x265-commits] [x265] asm: add macro to sub_ps module to reduce code size
Sumalatha at videolan.org
Sumalatha at videolan.org
Fri Apr 17 21:09:34 CEST 2015
details: http://hg.videolan.org/x265/rev/e38791928d0d
branches:
changeset: 10191:e38791928d0d
user: Sumalatha Polureddy
date: Thu Apr 16 10:36:39 2015 +0530
description:
asm: add macro to sub_ps module to reduce code size
Subject: [x265] asm: avx2 code for chroma sub_ps module, reused luma code
details: http://hg.videolan.org/x265/rev/abd09d4a8a8c
branches:
changeset: 10192:abd09d4a8a8c
user: Sumalatha Polureddy
date: Thu Apr 16 10:46:18 2015 +0530
description:
asm: avx2 code for chroma sub_ps module, reused luma code
sse4
[i422] sub_ps[16x32] 5.50x 1386.46 7627.27
[i422] sub_ps[32x64] 5.28x 5137.07 27110.01
avx2
[i422] sub_ps[16x32] 9.22x 831.52 7665.70
[i422] sub_ps[32x64] 10.59x 2581.10 27343.41
Subject: [x265] asm: ssse3 10bit code for convert_p2s[12xN],[48x64]
details: http://hg.videolan.org/x265/rev/0ba40f2c58e6
branches:
changeset: 10193:0ba40f2c58e6
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Wed Apr 15 19:04:04 2015 +0530
description:
asm: ssse3 10bit code for convert_p2s[12xN],[48x64]
convert_p2s[12x16](11.37x), convert_p2s[48x64](10.50x)
Subject: [x265] asm: ssse3 10bit code for chroma_p2s[4x2],[8x2],[8x6]
details: http://hg.videolan.org/x265/rev/6c9a7e820080
branches:
changeset: 10194:6c9a7e820080
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Wed Apr 15 19:16:10 2015 +0530
description:
asm: ssse3 10bit code for chroma_p2s[4x2],[8x2],[8x6]
chroma_p2s[i420][4x2](2.52x), chroma_p2s[i420][8x2](3.59x),
chroma_p2s[i420][8x6](5.09x)
Subject: [x265] asm: sse4 10bit code for chroma_p2s[6xN] for i420, i422
details: http://hg.videolan.org/x265/rev/0096e8730ebd
branches:
changeset: 10195:0096e8730ebd
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Apr 16 10:53:33 2015 +0530
description:
asm: sse4 10bit code for chroma_p2s[6xN] for i420, i422
chroma_p2s[i420][6x8](2.89x), chroma_p2s[i422][6x16](3.37x)
Subject: [x265] asm: sse4 10bit code for chroma_p2s[2xN] for i420, i422
details: http://hg.videolan.org/x265/rev/9736b429d394
branches:
changeset: 10196:9736b429d394
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Apr 16 11:24:34 2015 +0530
description:
asm: sse4 10bit code for chroma_p2s[2xN] for i420, i422
chroma_p2s[i420][2x4](1.71x), chroma_p2s[i420][2x8](1.99x),
chroma_p2s[i422][2x16](2.14x)
Subject: [x265] asm: sse version 10bit code for chroma_p2s, reuse luma code
details: http://hg.videolan.org/x265/rev/c18e52fa210c
branches:
changeset: 10197:c18e52fa210c
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Apr 16 11:37:03 2015 +0530
description:
asm: sse version 10bit code for chroma_p2s, reuse luma code
Subject: [x265] asm: new optimized algorithm for satd, improved ~30% over previous algorithm
details: http://hg.videolan.org/x265/rev/7be1172ec816
branches:
changeset: 10198:7be1172ec816
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Apr 16 11:38:32 2015 +0530
description:
asm: new optimized algorithm for satd, improved ~30% over previous algorithm
Subject: [x265] Backed out changeset: 7be1172ec816
details: http://hg.videolan.org/x265/rev/3d5a3e331652
branches:
changeset: 10199:3d5a3e331652
user: Steve Borho <steve at borho.org>
date: Fri Apr 17 13:44:26 2015 -0500
description:
Backed out changeset: 7be1172ec816
Subject: [x265] asm: chroma_hps[64x64, 64x48, 64x32, 64x16] for i444 - improved 21540c->14767c, 18551c->14129c,17096c->12742c, 6216c->3923c
details: http://hg.videolan.org/x265/rev/b30a7d159a65
branches:
changeset: 10200:b30a7d159a65
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Thu Apr 16 14:28:41 2015 +0530
description:
asm: chroma_hps[64x64, 64x48, 64x32, 64x16] for i444 - improved 21540c->14767c, 18551c->14129c,17096c->12742c, 6216c->3923c
Subject: [x265] asm: chroma_hpp i422[4xN, 8xN, 16xN, 32xN]
details: http://hg.videolan.org/x265/rev/338ed295d81e
branches:
changeset: 10201:338ed295d81e
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Thu Apr 16 15:28:19 2015 +0530
description:
asm: chroma_hpp i422[4xN, 8xN, 16xN, 32xN]
Subject: [x265] asm: avx2 10bit code for convert_p2s[16xN]
details: http://hg.videolan.org/x265/rev/7b7437578a4d
branches:
changeset: 10202:7b7437578a4d
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Apr 16 18:17:36 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[16xN]
convert_p2s[16x4](10.44x), convert_p2s[16x8](15.30x),
convert_p2s[16x12](16.55x), convert_p2s[16x16](17.48x),
convert_p2s[16x32](17.57x), convert_p2s[16x64](20.21x)
Subject: [x265] asm: avx2 10bit code for convert_p2s[32xN],[64xN]
details: http://hg.videolan.org/x265/rev/d70d71419691
branches:
changeset: 10203:d70d71419691
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Apr 16 14:18:03 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[32xN],[64xN]
convert_p2s[32x8](15.77x), convert_p2s[32x16](19.15x),
convert_p2s[32x24](16.84x), convert_p2s[32x32](20.49x),
convert_p2s[32x64](21.19x), convert_p2s[64x16](18.99x),
convert_p2s[64x32](17.16x), convert_p2s[64x48](19.27x),
convert_p2s[64x64](16.86x)
Subject: [x265] regression: typo in rc tests
details: http://hg.videolan.org/x265/rev/4280cd25da6e
branches:
changeset: 10204:4280cd25da6e
user: Mahesh Pittala <mahesh at multicorewareinc.com>
date: Thu Apr 16 15:53:53 2015 +0530
description:
regression: typo in rc tests
Subject: [x265] rc: unix eoln for rate-control-tests.txt
details: http://hg.videolan.org/x265/rev/0d9f2a56bccd
branches:
changeset: 10205:0d9f2a56bccd
user: Steve Borho <steve at borho.org>
date: Thu Apr 16 17:12:00 2015 -0700
description:
rc: unix eoln for rate-control-tests.txt
Subject: [x265] asm: chroma_hpp[64x64, 64x48, 64x32, 64x16] for i444 - improved 22990c->14176c, 17897c->10791c, 12050c->7186c, 5655c->3266c
details: http://hg.videolan.org/x265/rev/af03705428c3
branches:
changeset: 10206:af03705428c3
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Fri Apr 17 10:28:18 2015 +0530
description:
asm: chroma_hpp[64x64, 64x48, 64x32, 64x16] for i444 - improved 22990c->14176c, 17897c->10791c, 12050c->7186c, 5655c->3266c
Subject: [x265] asm: avx2 10bit code for convert_p2s[24xN],[48x64]
details: http://hg.videolan.org/x265/rev/f8376ecbfb09
branches:
changeset: 10207:f8376ecbfb09
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 17 11:19:20 2015 +0530
description:
asm: avx2 10bit code for convert_p2s[24xN],[48x64]
convert_p2s[24x32](20.90x), convert_p2s[48x64](18.89x)
Subject: [x265] asm: avx2 code for chroma addAvg for all partitions
details: http://hg.videolan.org/x265/rev/b91d7ed6fd1e
branches:
changeset: 10208:b91d7ed6fd1e
user: Sumalatha Polureddy
date: Fri Apr 17 14:34:27 2015 +0530
description:
asm: avx2 code for chroma addAvg for all partitions
Subject: [x265] asm: avx2 10bit code for chroma_p2s[16xN],[24xN],[32xN], reuse luma code
details: http://hg.videolan.org/x265/rev/ba491b4e3b67
branches:
changeset: 10209:ba491b4e3b67
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 17 15:51:55 2015 +0530
description:
asm: avx2 10bit code for chroma_p2s[16xN],[24xN],[32xN], reuse luma code
Subject: [x265] asm: chroma_hpp[4xN, 8xN, 16xN, 32xN, 12x16, 24x32] for i444
details: http://hg.videolan.org/x265/rev/cfb33d361b5f
branches:
changeset: 10210:cfb33d361b5f
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Fri Apr 17 16:10:03 2015 +0530
description:
asm: chroma_hpp[4xN, 8xN, 16xN, 32xN, 12x16, 24x32] for i444
Subject: [x265] asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 2x8]
details: http://hg.videolan.org/x265/rev/77d0418ab73f
branches:
changeset: 10211:77d0418ab73f
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Fri Apr 17 16:43:39 2015 +0530
description:
asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 2x8]
Subject: [x265] asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 24x32] for i444
details: http://hg.videolan.org/x265/rev/c1fd719930d1
branches:
changeset: 10212:c1fd719930d1
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Fri Apr 17 16:56:59 2015 +0530
description:
asm: chroma_hps[4xN, 8xN, 16xN, 32xN, 24x32] for i444
Subject: [x265] asm: intra_allangs4x4 improved by ~61% over SSE4
details: http://hg.videolan.org/x265/rev/405ce9f2a527
branches:
changeset: 10213:405ce9f2a527
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 17 16:23:28 2015 +0530
description:
asm: intra_allangs4x4 improved by ~61% over SSE4
AVX2:
intra_allangs4x4 31.17x 1070.01 33353.50
SSE4:
intra_allangs4x4 12.04x 2746.58 33061.69
Subject: [x265] asm: interp_4tap_horiz_pp_2x4_sse3
details: http://hg.videolan.org/x265/rev/d59990f190a0
branches:
changeset: 10214:d59990f190a0
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:04:59 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x4_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x4\]"
chroma_hpp[ 2x4] 1.83x 594.91 1089.98
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x4\]"
chroma_hpp[ 2x4] 1.75x 739.88 1297.40
Subject: [x265] asm: interp_4tap_horiz_pp_2x8_sse3
details: http://hg.videolan.org/x265/rev/cdd3b34296d7
branches:
changeset: 10215:cdd3b34296d7
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:10:24 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x8_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x8\]"
chroma_hpp[ 2x8] 1.83x 1104.89 2026.66
chroma_hpp[ 2x8] 1.84x 1102.43 2025.46
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x8\]"
chroma_hpp[ 2x8] 1.93x 1252.42 2419.18
chroma_hpp[ 2x8] 1.93x 1252.46 2417.46
Subject: [x265] asm: interp_4tap_horiz_pp_2x16_sse3
details: http://hg.videolan.org/x265/rev/f27f6f1f2182
branches:
changeset: 10216:f27f6f1f2182
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:15:04 2015 -0700
description:
asm: interp_4tap_horiz_pp_2x16_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 2x16\]"
chroma_hpp[ 2x16] 2.35x 2122.49 4982.88
32-bit
/test/TestBench --testbench interp | grep "chroma_hpp\[ 2x16\]"
chroma_hpp[ 2x16] 2.20x 2262.45 4985.37
Subject: [x265] asm: interp_4tap_horiz_pp_4x2_sse3
details: http://hg.videolan.org/x265/rev/f79d5d32627c
branches:
changeset: 10217:f79d5d32627c
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:21:15 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x2_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x2\]"
chroma_hpp[ 4x2] 2.09x 475.01 992.48
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x2\]"
chroma_hpp[ 4x2] 2.42x 549.94 1329.94
Subject: [x265] asm: interp_4tap_horiz_pp_4x4_sse3
details: http://hg.videolan.org/x265/rev/f8fb91b3003c
branches:
changeset: 10218:f8fb91b3003c
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:26:26 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x4_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x4\]"
chroma_hpp[ 4x4] 2.18x 872.49 1902.48
chroma_hpp[ 4x4] 2.17x 874.99 1900.67
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x4\]"
chroma_hpp[ 4x4] 2.59x 952.48 2462.42
chroma_hpp[ 4x4] 2.59x 950.02 2462.46
Subject: [x265] asm: interp_4tap_horiz_pp_4x8_sse3
details: http://hg.videolan.org/x265/rev/ab8fc4c59cc7
branches:
changeset: 10219:ab8fc4c59cc7
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:33:03 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x8_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x8\]"
chroma_hpp[ 4x8] 2.83x 1682.50 4759.97
chroma_hpp[ 4x8] 2.83x 1682.50 4759.99
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x8\]"
chroma_hpp[ 4x8] 2.77x 1765.03 4892.46
chroma_hpp[ 4x8] 2.77x 1765.00 4892.48
Subject: [x265] asm: interp_4tap_horiz_pp_4x16_sse3
details: http://hg.videolan.org/x265/rev/6b7f853e3a7f
branches:
changeset: 10220:6b7f853e3a7f
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:36:54 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x16_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x16\]"
chroma_hpp[ 4x16] 2.84x 3302.49 9392.47
chroma_hpp[ 4x16] 2.84x 3302.50 9392.49
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x16\]"
chroma_hpp[ 4x16] 2.81x 3380.07 9499.97
chroma_hpp[ 4x16] 2.81x 3382.50 9499.97
Subject: [x265] asm: interp_4tap_horiz_pp_4x32_sse3
details: http://hg.videolan.org/x265/rev/45610a24b399
branches:
changeset: 10221:45610a24b399
user: David T Yuen <dtyx265 at gmail.com>
date: Fri Apr 17 09:40:06 2015 -0700
description:
asm: interp_4tap_horiz_pp_4x32_sse3
This replaces c code.
64-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x32\]"
chroma_hpp[ 4x32] 2.80x 6552.49 18351.63
32-bit
./test/TestBench --testbench interp | grep "chroma_hpp\[ 4x32\]"
chroma_hpp[ 4x32] 2.79x 6625.03 18459.97
diffstat:
source/common/x86/asm-primitives.cpp | 244 +++++++++-
source/common/x86/intrapred.h | 1 +
source/common/x86/intrapred8_allangs.asm | 376 +++++++++++++++
source/common/x86/ipfilter16.asm | 760 +++++++++++++++++++++++++++++++
source/common/x86/ipfilter8.asm | 481 +++++++++++++++++++
source/common/x86/ipfilter8.h | 87 +++-
source/common/x86/mc-a.asm | 6 +
source/common/x86/pixel-util8.asm | 18 +-
source/common/x86/pixel.h | 2 +
source/test/rate-control-tests.txt | 72 +-
10 files changed, 2003 insertions(+), 44 deletions(-)
diffs (truncated from 2387 to 300 lines):
diff -r f9c0e1f233cc -r 45610a24b399 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Wed Apr 15 16:20:27 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp Fri Apr 17 09:40:06 2015 -0700
@@ -972,6 +972,49 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_ssse3;
p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_ssse3;
p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_ssse3;
+ p.pu[LUMA_12x16].convert_p2s = x265_filterPixelToShort_12x16_ssse3;
+ p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_ssse3;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].p2s = x265_filterPixelToShort_16x4_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].p2s = x265_filterPixelToShort_16x12_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].p2s = x265_filterPixelToShort_4x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].p2s = x265_filterPixelToShort_8x12_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].p2s = x265_filterPixelToShort_8x64_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].p2s = x265_filterPixelToShort_12x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].p2s = x265_filterPixelToShort_16x24_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].p2s = x265_filterPixelToShort_16x64_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_ssse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].p2s = x265_filterPixelToShort_4x2_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].p2s = x265_filterPixelToShort_8x2_ssse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].p2s = x265_filterPixelToShort_8x6_ssse3;
}
if (cpuMask & X265_CPU_SSE4)
{
@@ -1011,6 +1054,13 @@ void setupAssemblyPrimitives(EncoderPrim
ALL_LUMA_TU_S(copy_cnt, copy_cnt_, sse4);
ALL_LUMA_CU(psy_cost_pp, psyCost_pp, sse4);
ALL_LUMA_CU(psy_cost_ss, psyCost_ss, sse4);
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].p2s = x265_filterPixelToShort_2x4_sse4;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].p2s = x265_filterPixelToShort_6x8_sse4;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].p2s = x265_filterPixelToShort_2x16_sse4;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].p2s = x265_filterPixelToShort_6x16_sse4;
}
if (cpuMask & X265_CPU_AVX)
{
@@ -1173,6 +1223,45 @@ void setupAssemblyPrimitives(EncoderPrim
ALL_LUMA_PU(luma_vps, interp_8tap_vert_ps, avx2);
ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);
ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);
+
+ p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_avx2;
+ p.pu[LUMA_16x8].convert_p2s = x265_filterPixelToShort_16x8_avx2;
+ p.pu[LUMA_16x12].convert_p2s = x265_filterPixelToShort_16x12_avx2;
+ p.pu[LUMA_16x16].convert_p2s = x265_filterPixelToShort_16x16_avx2;
+ p.pu[LUMA_16x32].convert_p2s = x265_filterPixelToShort_16x32_avx2;
+ p.pu[LUMA_16x64].convert_p2s = x265_filterPixelToShort_16x64_avx2;
+ p.pu[LUMA_32x8].convert_p2s = x265_filterPixelToShort_32x8_avx2;
+ p.pu[LUMA_32x16].convert_p2s = x265_filterPixelToShort_32x16_avx2;
+ p.pu[LUMA_32x24].convert_p2s = x265_filterPixelToShort_32x24_avx2;
+ p.pu[LUMA_32x32].convert_p2s = x265_filterPixelToShort_32x32_avx2;
+ p.pu[LUMA_32x64].convert_p2s = x265_filterPixelToShort_32x64_avx2;
+ p.pu[LUMA_64x16].convert_p2s = x265_filterPixelToShort_64x16_avx2;
+ p.pu[LUMA_64x32].convert_p2s = x265_filterPixelToShort_64x32_avx2;
+ p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_avx2;
+ p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_avx2;
+ p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_avx2;
+ p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].p2s = x265_filterPixelToShort_16x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].p2s = x265_filterPixelToShort_16x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].p2s = x265_filterPixelToShort_16x12_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].p2s = x265_filterPixelToShort_16x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].p2s = x265_filterPixelToShort_16x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].p2s = x265_filterPixelToShort_24x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].p2s = x265_filterPixelToShort_16x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].p2s = x265_filterPixelToShort_16x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].p2s = x265_filterPixelToShort_16x24_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].p2s = x265_filterPixelToShort_16x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].p2s = x265_filterPixelToShort_16x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_avx2;
}
}
#else // if HIGH_BIT_DEPTH
@@ -1304,6 +1393,21 @@ void setupAssemblyPrimitives(EncoderPrim
p.planecopy_sp = x265_downShift_16_sse2;
}
+ if (cpuMask & X265_CPU_SSE3)
+ {
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_hpp = x265_interp_4tap_horiz_pp_2x4_sse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_sse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_hpp = x265_interp_4tap_horiz_pp_4x2_sse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_sse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_sse3;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_sse3;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_hpp = x265_interp_4tap_horiz_pp_4x32_sse3;
+ }
if (cpuMask & X265_CPU_SSSE3)
{
p.pu[LUMA_8x16].sad_x3 = x265_pixel_sad_x3_8x16_ssse3;
@@ -1646,20 +1750,35 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].addAvg = x265_addAvg_8x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].addAvg = x265_addAvg_8x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].addAvg = x265_addAvg_8x32_avx2;
-
p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].addAvg = x265_addAvg_12x16_avx2;
-
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].addAvg = x265_addAvg_16x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].addAvg = x265_addAvg_16x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].addAvg = x265_addAvg_16x12_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].addAvg = x265_addAvg_16x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].addAvg = x265_addAvg_16x32_avx2;
-
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].addAvg = x265_addAvg_32x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].addAvg = x265_addAvg_32x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].addAvg = x265_addAvg_32x24_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].addAvg = x265_addAvg_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].addAvg = x265_addAvg_8x4_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].addAvg = x265_addAvg_8x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].addAvg = x265_addAvg_8x12_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].addAvg = x265_addAvg_8x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].addAvg = x265_addAvg_8x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].addAvg = x265_addAvg_8x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].addAvg = x265_addAvg_12x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].addAvg = x265_addAvg_16x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].addAvg = x265_addAvg_16x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].addAvg = x265_addAvg_16x24_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].addAvg = x265_addAvg_16x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].addAvg = x265_addAvg_16x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].addAvg = x265_addAvg_24x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].addAvg = x265_addAvg_32x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].addAvg = x265_addAvg_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].addAvg = x265_addAvg_32x48_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].addAvg = x265_addAvg_32x64_avx2;
+
p.cu[BLOCK_16x16].add_ps = x265_pixel_add_ps_16x16_avx2;
p.cu[BLOCK_32x32].add_ps = x265_pixel_add_ps_32x32_avx2;
p.cu[BLOCK_64x64].add_ps = x265_pixel_add_ps_64x64_avx2;
@@ -1673,6 +1792,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_64x64].sub_ps = x265_pixel_sub_ps_64x64_avx2;
p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].sub_ps = x265_pixel_sub_ps_16x16_avx2;
p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].sub_ps = x265_pixel_sub_ps_32x32_avx2;
+ p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].sub_ps = x265_pixel_sub_ps_16x32_avx2;
+ p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].sub_ps = x265_pixel_sub_ps_32x64_avx2;
p.pu[LUMA_16x4].pixelavg_pp = x265_pixel_avg_16x4_avx2;
p.pu[LUMA_16x8].pixelavg_pp = x265_pixel_avg_16x8_avx2;
@@ -1857,6 +1978,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_32x32].intra_pred[21] = x265_intra_pred_ang32_21_avx2;
p.cu[BLOCK_32x32].intra_pred[18] = x265_intra_pred_ang32_18_avx2;
+ // all_angs primitives
+ p.cu[BLOCK_4x4].intra_pred_allangs = x265_all_angs_pred_4x4_avx2;
+
// copy_sp primitives
p.cu[BLOCK_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
@@ -2134,6 +2258,120 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].filter_hpp = x265_interp_4tap_horiz_pp_24x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_hpp = x265_interp_4tap_horiz_pp_2x16_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_hpp = x265_interp_4tap_horiz_pp_8x4_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_hpp = x265_interp_4tap_horiz_pp_8x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_hpp = x265_interp_4tap_horiz_pp_8x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_hpp = x265_interp_4tap_horiz_pp_8x12_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].filter_hpp = x265_interp_4tap_horiz_pp_16x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].filter_hpp = x265_interp_4tap_horiz_pp_16x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].filter_hpp = x265_interp_4tap_horiz_pp_16x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_hpp = x265_interp_4tap_horiz_pp_16x24_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].filter_hpp = x265_interp_4tap_horiz_pp_32x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_hpp = x265_interp_4tap_horiz_pp_32x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_hpp = x265_interp_4tap_horiz_pp_32x48_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hpp = x265_interp_4tap_horiz_pp_2x8_avx2;
+
+ //i444 filters hpp
+
+ p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_hpp = x265_interp_4tap_horiz_pp_8x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_hpp = x265_interp_4tap_horiz_pp_8x4_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_16x8].filter_hpp = x265_interp_4tap_horiz_pp_16x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x32].filter_hpp = x265_interp_4tap_horiz_pp_16x32_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x12].filter_hpp = x265_interp_4tap_horiz_pp_16x12_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x4].filter_hpp = x265_interp_4tap_horiz_pp_16x4_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x64].filter_hpp = x265_interp_4tap_horiz_pp_16x64_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_12x16].filter_hpp = x265_interp_4tap_horiz_pp_12x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_24x32].filter_hpp = x265_interp_4tap_horiz_pp_24x32_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_32x16].filter_hpp = x265_interp_4tap_horiz_pp_32x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_32x64].filter_hpp = x265_interp_4tap_horiz_pp_32x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_32x24].filter_hpp = x265_interp_4tap_horiz_pp_32x24_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_32x8].filter_hpp = x265_interp_4tap_horiz_pp_32x8_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hpp = x265_interp_4tap_horiz_pp_64x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_hpp = x265_interp_4tap_horiz_pp_64x32_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_hpp = x265_interp_4tap_horiz_pp_64x48_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_hpp = x265_interp_4tap_horiz_pp_64x16_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].filter_hps = x265_interp_4tap_horiz_ps_8x4_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].filter_hps = x265_interp_4tap_horiz_ps_8x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].filter_hps = x265_interp_4tap_horiz_ps_8x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].filter_hps = x265_interp_4tap_horiz_ps_8x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_hps = x265_interp_4tap_horiz_ps_8x64_avx2; //adding macro call
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_hps = x265_interp_4tap_horiz_ps_8x12_avx2; //adding macro call
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].filter_hps = x265_interp_4tap_horiz_ps_16x64_avx2;//adding macro call
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_hps = x265_interp_4tap_horiz_ps_16x24_avx2;//adding macro call
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].filter_hps = x265_interp_4tap_horiz_ps_32x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_hps = x265_interp_4tap_horiz_ps_32x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_hps = x265_interp_4tap_horiz_ps_32x48_avx2;
+
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].filter_hps = x265_interp_4tap_horiz_ps_2x8_avx2;
+
+ //i444 chroma_hps
+ p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_hps = x265_interp_4tap_horiz_ps_64x32_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_hps = x265_interp_4tap_horiz_ps_64x48_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_hps = x265_interp_4tap_horiz_ps_64x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_hps = x265_interp_4tap_horiz_ps_64x64_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x8].filter_hps = x265_interp_4tap_horiz_ps_8x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_8x4].filter_hps = x265_interp_4tap_horiz_ps_8x4_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x16].filter_hps = x265_interp_4tap_horiz_ps_8x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_8x32].filter_hps = x265_interp_4tap_horiz_ps_8x32_avx2;
+
+ p.chroma[X265_CSP_I444].pu[LUMA_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;
More information about the x265-commits
mailing list