[x265-commits] [x265] frameencoder: use a bonded worker thread to perform weigh...

Steve Borho steve at borho.org
Fri Mar 6 03:54:48 CET 2015


details:   http://hg.videolan.org/x265/rev/820dcc3216a5
branches:  
changeset: 9628:820dcc3216a5
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 05 13:44:43 2015 -0600
description:
frameencoder: use a bonded worker thread to perform weight analysis, add stat

Weight analysis can take a substantial amount of time. It is best to use a
worker thread so the frame encoder thread can stay blocked during all of this
processing (we want worker threads to use the cores, not the frame encoders)

Weight analysis can be 1% of the total elapsed encoder time
Subject: [x265] asm: filter_vpp[32x24, 32x16, 32x8], filter_vps[32x24, 32x16, 32x8]

details:   http://hg.videolan.org/x265/rev/1adb97180645
branches:  
changeset: 9629:1adb97180645
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Mar 05 12:11:17 2015 +0530
description:
asm: filter_vpp[32x24, 32x16, 32x8], filter_vps[32x24, 32x16, 32x8]

filter_vpp[32x24, 32x16, 32x8]: 2813c->1442c, 1893c->988c, 1021c->557c
filter_vps[32x24, 32x16, 32x8]: 2837c->1798c, 2059c->1281c, 1001c->643c
Subject: [x265] asm: filter_vpp[4x16], filter_vps[4x16]: 786c->590c, 651c->489c

details:   http://hg.videolan.org/x265/rev/b61834b7e70f
branches:  
changeset: 9630:b61834b7e70f
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Mar 05 15:08:04 2015 +0530
description:
asm: filter_vpp[4x16], filter_vps[4x16]: 786c->590c, 651c->489c
Subject: [x265] asm: filter_vpp[8x32], filter_vps[8x32]: 1028c->937c, 902c->860c

details:   http://hg.videolan.org/x265/rev/45e6f21de824
branches:  
changeset: 9631:45e6f21de824
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Mar 05 15:54:31 2015 +0530
description:
asm: filter_vpp[8x32], filter_vps[8x32]: 1028c->937c, 902c->860c
Subject: [x265] asm: filter_vpp[8x2], filter_vps[8x2]: 141c->118c, 131c->113c

details:   http://hg.videolan.org/x265/rev/fad9166f0c58
branches:  
changeset: 9632:fad9166f0c58
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Thu Mar 05 17:09:08 2015 +0530
description:
asm: filter_vpp[8x2], filter_vps[8x2]: 141c->118c, 131c->113c
Subject: [x265] asm: intra pred dc4 sse2 high bit

details:   http://hg.videolan.org/x265/rev/b03eee80a8db
branches:  
changeset: 9633:b03eee80a8db
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 12:21:23 2015 -0800
description:
asm: intra pred dc4 sse2 high bit

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred dc4 sse4 high bit

./test/TestBench --testbench intrapred | grep intra_dc_4x4
intra_dc_4x4[f=0]	1.61x 	 160.19   	 257.37
intra_dc_4x4[f=1]	1.23x 	 382.56   	 470.81
Subject: [x265] asm: intra pred dc8 sse2 high bit

details:   http://hg.videolan.org/x265/rev/b823d7674307
branches:  
changeset: 9634:b823d7674307
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 12:45:55 2015 -0800
description:
asm: intra pred dc8 sse2 high bit

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred dc8 sse4 high bit

./test/TestBench --testbench intrapred | grep intra_dc_8x8
intra_dc_8x8[f=0]	1.64x 	 437.56   	 719.58
intra_dc_8x8[f=1]	1.55x 	 747.51   	 1157.56
Subject: [x265] asm: intra pred dc16 sse2 high bit

details:   http://hg.videolan.org/x265/rev/0ac80df0c297
branches:  
changeset: 9635:0ac80df0c297
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 13:54:48 2015 -0800
description:
asm: intra pred dc16 sse2 high bit

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred dc16 sse4 high bit

./test/TestBench --testbench intrapred | grep intra_dc_16x16
intra_dc_16x16[f=0]	2.55x 	 1022.78  	 2607.93
intra_dc_16x16[f=1]	2.57x 	 1510.13  	 3887.95
Subject: [x265] asm: intra pred dc32 sse2

details:   http://hg.videolan.org/x265/rev/3a6ab7bda69f
branches:  
changeset: 9636:3a6ab7bda69f
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 14:31:59 2015 -0800
description:
asm: intra pred dc32 sse2

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred dc32 sse4

64-bit

./test/TestBench --testbench intrapred | grep intra_dc_32x32
intra_dc_32x32[f=0]	4.53x 	 1650.00  	 7474.94

32-bit

./test/TestBench --testbench intrapred | grep intra_dc_32x32
intra_dc_32x32[f=0]	7.79x 	 1749.94  	 13627.45
Subject: [x265] asm: intra pred dc32 sse2 high bit

details:   http://hg.videolan.org/x265/rev/07ffafe5fbd1
branches:  
changeset: 9637:07ffafe5fbd1
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 14:55:09 2015 -0800
description:
asm: intra pred dc32 sse2 high bit

This patch moves x265_intra_pred_dc32_sse2 in the file to group it with the other sse2 primitives
It is also adds to asm-primitives.cpp.
Subject: [x265] asm: intra pred planar8 sse2

details:   http://hg.videolan.org/x265/rev/56024df7a549
branches:  
changeset: 9638:56024df7a549
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 15:33:28 2015 -0800
description:
asm: intra pred planar8 sse2

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar8 sse4

64-bit

./test/TestBench --testbench intrapred | grep intra_planar_8x8
intra_planar_8x8	3.34x 	 997.49   	 3327.61

32-bit

./test/TestBench --testbench intrapred | grep intra_planar_8x8
intra_planar_8x8	3.90x 	 1042.49  	 4062.56

This patch also groups intra pred planar 8 sse2 with the other sse2 primitives
Subject: [x265] asm: intra pred planar8 sse2 high bit

details:   http://hg.videolan.org/x265/rev/56d04adf9de3
branches:  
changeset: 9639:56d04adf9de3
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 15:43:35 2015 -0800
description:
asm: intra pred planar8 sse2 high bit

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar8 sse4 high bit

./test/TestBench --testbench intrapred | grep intra_planar_8x8
intra_planar_8x8	3.50x 	 902.49   	 3158.04
Subject: [x265] asm: intra pred planar16 sse2

details:   http://hg.videolan.org/x265/rev/d1de5d92be56
branches:  
changeset: 9640:d1de5d92be56
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 15:53:55 2015 -0800
description:
asm: intra pred planar16 sse2

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar16 sse4

./test/TestBench --testbench intrapred | grep intra_planar_16x16
intra_planar_16x16	4.47x 	 2727.49  	 12185.39
Subject: [x265] asm: intra pred planar16 sse2 high bit

details:   http://hg.videolan.org/x265/rev/6c6dc3667746
branches:  
changeset: 9641:6c6dc3667746
user:      David T Yuen <dtyx265 at gmail.com>
date:      Thu Mar 05 16:01:49 2015 -0800
description:
asm: intra pred planar16 sse2 high bit

This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar16 sse4 high bit

./test/TestBench --testbench intrapred | grep intra_planar_16x16
intra_planar_16x16	4.90x 	 2507.48  	 12282.71
Subject: [x265] asm: simply AVX2 code by ignoring 32bit builds and moving functions together

details:   http://hg.videolan.org/x265/rev/876936094c7a
branches:  
changeset: 9642:876936094c7a
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 05 20:18:51 2015 -0600
description:
asm: simply AVX2 code by ignoring 32bit builds and moving functions together
Subject: [x265] api: add support for transfer characteristics added in HEVC V2

details:   http://hg.videolan.org/x265/rev/f27f2c0b2d8c
branches:  
changeset: 9643:f27f2c0b2d8c
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 05 19:50:40 2015 -0600
description:
api: add support for transfer characteristics added in HEVC V2
Subject: [x265] asm: prevent assembly use in 32bit HBD builds

details:   http://hg.videolan.org/x265/rev/45deb0125890
branches:  
changeset: 9644:45deb0125890
user:      Steve Borho <steve at borho.org>
date:      Thu Mar 05 20:39:08 2015 -0600
description:
asm: prevent assembly use in 32bit HBD builds

We have prevented HIGH_BIT_DEPTH builds for 32bit target platforms for a very
long time. In the interest of keeping the code clean, we'll programically
prevent assembly use in the event this unsupported build combination is used.

diffstat:

 doc/reST/cli.rst                     |    2 +
 source/CMakeLists.txt                |    1 +
 source/common/param.cpp              |    4 +-
 source/common/x86/asm-primitives.cpp |  159 ++++-------
 source/common/x86/intrapred.h        |    3 +
 source/common/x86/intrapred16.asm    |  472 ++++++++++++++++++++++++++++++++--
 source/common/x86/intrapred8.asm     |  297 +++++++++++++++++----
 source/common/x86/ipfilter8.asm      |  242 +++++++++++++++++-
 source/encoder/encoder.cpp           |    9 +-
 source/encoder/frameencoder.cpp      |   19 +-
 source/encoder/frameencoder.h        |   15 +
 source/encoder/search.h              |    4 +
 source/x265.h                        |    3 +-
 13 files changed, 1038 insertions(+), 192 deletions(-)

diffs (truncated from 1605 to 300 lines):

diff -r e6b519dfbf81 -r 45deb0125890 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Thu Mar 05 16:06:04 2015 +0530
+++ b/doc/reST/cli.rst	Thu Mar 05 20:39:08 2015 -0600
@@ -1387,6 +1387,8 @@ VUI fields must be manually specified.
 	13. iec61966-2-1
 	14. bt2020-10
 	15. bt2020-12
+	16. smpte-st-2084
+	17. smpte-st-428
 
 .. option:: --colormatrix <integer|string>
 
diff -r e6b519dfbf81 -r 45deb0125890 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Thu Mar 05 16:06:04 2015 +0530
+++ b/source/CMakeLists.txt	Thu Mar 05 20:39:08 2015 -0600
@@ -213,6 +213,7 @@ if(X64)
     # can disable this if(X64) check if you desparately need a 32bit
     # build with 10bit/12bit support, but this violates the "shrink wrap
     # license" so to speak.  If it breaks you get to keep both halves.
+    # You will likely need to compile without assembly
     option(HIGH_BIT_DEPTH "Store pixels as 16bit values" OFF)
 endif(X64)
 if(HIGH_BIT_DEPTH)
diff -r e6b519dfbf81 -r 45deb0125890 source/common/param.cpp
--- a/source/common/param.cpp	Thu Mar 05 16:06:04 2015 +0530
+++ b/source/common/param.cpp	Thu Mar 05 20:39:08 2015 -0600
@@ -1088,11 +1088,11 @@ int x265_check_params(x265_param *param)
           "Color Primaries must be undef, bt709, bt470m,"
           " bt470bg, smpte170m, smpte240m, film or bt2020");
     CHECK(param->vui.transferCharacteristics < 0
-          || param->vui.transferCharacteristics > 15
+          || param->vui.transferCharacteristics > 17
           || param->vui.transferCharacteristics == 3,
           "Transfer Characteristics must be undef, bt709, bt470m, bt470bg,"
           " smpte170m, smpte240m, linear, log100, log316, iec61966-2-4, bt1361e,"
-          " iec61966-2-1, bt2020-10 or bt2020-12");
+          " iec61966-2-1, bt2020-10, bt2020-12, smpte-st-2084 or smpte-st-428");
     CHECK(param->vui.matrixCoeffs < 0
           || param->vui.matrixCoeffs > 10
           || param->vui.matrixCoeffs == 3,
diff -r e6b519dfbf81 -r 45deb0125890 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Thu Mar 05 16:06:04 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp	Thu Mar 05 20:39:08 2015 -0600
@@ -796,6 +796,11 @@ void interp_8tap_hv_pp_cpu(const pixel* 
 
 void setupAssemblyPrimitives(EncoderPrimitives &p, int cpuMask) // 16bpp
 {
+#if !defined(X86_64)
+    x265_log(NULL, X265_LOG_WARNING, "Assembly not allowed in 32bit high bit-depth builds\n");
+    return;
+#endif
+
     if (cpuMask & X265_CPU_SSE2)
     {
         /* We do not differentiate CPUs which support MMX and not SSE2. We only check
@@ -868,7 +873,14 @@ void setupAssemblyPrimitives(EncoderPrim
         ALL_LUMA_TU_S(calcresidual, getResidual, sse2);
         ALL_LUMA_TU_S(transpose, transpose, sse2);
 
+        p.cu[BLOCK_4x4].intra_pred[DC_IDX] = x265_intra_pred_dc4_sse2;
+        p.cu[BLOCK_8x8].intra_pred[DC_IDX] = x265_intra_pred_dc8_sse2;
+        p.cu[BLOCK_16x16].intra_pred[DC_IDX] = x265_intra_pred_dc16_sse2;
+        p.cu[BLOCK_32x32].intra_pred[DC_IDX] = x265_intra_pred_dc32_sse2;
+
         p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = x265_intra_pred_planar4_sse2;
+        p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = x265_intra_pred_planar8_sse2;
+        p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
 
         p.cu[BLOCK_4x4].sse_ss = x265_pixel_ssd_ss_4x4_mmx2;
         ALL_LUMA_CU(sse_ss, pixel_ssd_ss, sse2);
@@ -881,9 +893,8 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_4x4].dct = x265_dct4_sse2;
         p.cu[BLOCK_8x8].dct = x265_dct8_sse2;
         p.cu[BLOCK_4x4].idct = x265_idct4_sse2;
-#if X86_64
         p.cu[BLOCK_8x8].idct = x265_idct8_sse2;
-#endif
+
         p.idst4x4 = x265_idst4_sse2;
 
         LUMA_VSS_FILTERS(sse2);
@@ -944,11 +955,8 @@ void setupAssemblyPrimitives(EncoderPrim
 
         // TODO: check POPCNT flag!
         ALL_LUMA_TU_S(copy_cnt, copy_cnt_, sse4);
-
-#if X86_64
         ALL_LUMA_CU(psy_cost_pp, psyCost_pp, sse4);
         ALL_LUMA_CU(psy_cost_ss, psyCost_ss, sse4);
-#endif
     }
     if (cpuMask & X265_CPU_AVX)
     {
@@ -1103,7 +1111,6 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_16x16].cpy2Dto1D_shr = x265_cpy2Dto1D_shr_16_avx2;
         p.cu[BLOCK_32x32].cpy2Dto1D_shr = x265_cpy2Dto1D_shr_32_avx2;
 
-#if X86_64
         ALL_LUMA_TU_S(dct, dct, avx2);
         ALL_LUMA_TU_S(idct, idct, avx2);
         ALL_LUMA_CU_S(transpose, transpose, avx2);
@@ -1112,33 +1119,6 @@ void setupAssemblyPrimitives(EncoderPrim
         ALL_LUMA_PU(luma_vps, interp_8tap_vert_ps, avx2);
         ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);
         ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);
-#else
-        /* functions with both 64-bit and 32-bit implementations */
-        p.cu[BLOCK_4x4].dct = x265_dct4_avx2;
-        p.pu[LUMA_4x4].luma_vpp = x265_interp_8tap_vert_pp_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vpp = x265_interp_8tap_vert_pp_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vpp = x265_interp_8tap_vert_pp_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vpp = x265_interp_8tap_vert_pp_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vpp = x265_interp_8tap_vert_pp_16x4_avx2;
-
-        p.pu[LUMA_4x4].luma_vps = x265_interp_8tap_vert_ps_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vps = x265_interp_8tap_vert_ps_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vps = x265_interp_8tap_vert_ps_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vps = x265_interp_8tap_vert_ps_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vps = x265_interp_8tap_vert_ps_16x4_avx2;
-
-        p.pu[LUMA_4x4].luma_vsp = x265_interp_8tap_vert_sp_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vsp = x265_interp_8tap_vert_sp_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vsp = x265_interp_8tap_vert_sp_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vsp = x265_interp_8tap_vert_sp_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vsp = x265_interp_8tap_vert_sp_16x4_avx2;
-
-        p.pu[LUMA_4x4].luma_vss = x265_interp_8tap_vert_ss_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vss = x265_interp_8tap_vert_ss_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vss = x265_interp_8tap_vert_ss_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vss = x265_interp_8tap_vert_ss_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vss = x265_interp_8tap_vert_ss_16x4_avx2;
-#endif
     }
 }
 #else // if HIGH_BIT_DEPTH
@@ -1209,8 +1189,11 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_4x4].intra_pred[DC_IDX] = x265_intra_pred_dc4_sse2;
         p.cu[BLOCK_8x8].intra_pred[DC_IDX] = x265_intra_pred_dc8_sse2;
         p.cu[BLOCK_16x16].intra_pred[DC_IDX] = x265_intra_pred_dc16_sse2;
+        p.cu[BLOCK_32x32].intra_pred[DC_IDX] = x265_intra_pred_dc32_sse2;
 
         p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = x265_intra_pred_planar4_sse2;
+        p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = x265_intra_pred_planar8_sse2;
+        p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
 
         p.cu[BLOCK_4x4].calcresidual = x265_getResidual4_sse2;
         p.cu[BLOCK_8x8].calcresidual = x265_getResidual8_sse2;
@@ -1430,6 +1413,7 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_16x16].sse_pp = x265_pixel_ssd_16x16_xop;
         p.frameInitLowres = x265_frame_init_lowres_core_xop;
     }
+#if X86_64
     if (cpuMask & X265_CPU_AVX2)
     {
         p.cu[BLOCK_16x16].add_ps = x265_pixel_add_ps_16x16_avx2;
@@ -1533,35 +1517,6 @@ void setupAssemblyPrimitives(EncoderPrim
 
         p.cu[BLOCK_64x64].copy_ps = x265_blockcopy_ps_64x64_avx2;
 
-        // missing 4x8, 4x16, 24x32, 12x16 for the fill set of luma PU
-        p.pu[LUMA_4x4].luma_hpp = x265_interp_8tap_horiz_pp_4x4_avx2;
-
-        p.pu[LUMA_8x4].luma_hpp = x265_interp_8tap_horiz_pp_8x4_avx2;
-        p.pu[LUMA_8x8].luma_hpp = x265_interp_8tap_horiz_pp_8x8_avx2;
-        p.pu[LUMA_8x16].luma_hpp = x265_interp_8tap_horiz_pp_8x16_avx2;
-        p.pu[LUMA_8x32].luma_hpp = x265_interp_8tap_horiz_pp_8x32_avx2;
-
-        p.pu[LUMA_16x4].luma_hpp = x265_interp_8tap_horiz_pp_16x4_avx2;
-        p.pu[LUMA_16x8].luma_hpp = x265_interp_8tap_horiz_pp_16x8_avx2;
-        p.pu[LUMA_16x12].luma_hpp = x265_interp_8tap_horiz_pp_16x12_avx2;
-        p.pu[LUMA_16x16].luma_hpp = x265_interp_8tap_horiz_pp_16x16_avx2;
-        p.pu[LUMA_16x32].luma_hpp = x265_interp_8tap_horiz_pp_16x32_avx2;
-        p.pu[LUMA_16x64].luma_hpp = x265_interp_8tap_horiz_pp_16x64_avx2;
-
-        p.pu[LUMA_32x8].luma_hpp  = x265_interp_8tap_horiz_pp_32x8_avx2;
-        p.pu[LUMA_32x16].luma_hpp = x265_interp_8tap_horiz_pp_32x16_avx2;
-        p.pu[LUMA_32x24].luma_hpp = x265_interp_8tap_horiz_pp_32x24_avx2;
-        p.pu[LUMA_32x32].luma_hpp = x265_interp_8tap_horiz_pp_32x32_avx2;
-        p.pu[LUMA_32x64].luma_hpp = x265_interp_8tap_horiz_pp_32x64_avx2;
-
-        p.pu[LUMA_64x64].luma_hpp = x265_interp_8tap_horiz_pp_64x64_avx2;
-        p.pu[LUMA_64x48].luma_hpp = x265_interp_8tap_horiz_pp_64x48_avx2;
-        p.pu[LUMA_64x32].luma_hpp = x265_interp_8tap_horiz_pp_64x32_avx2;
-        p.pu[LUMA_64x16].luma_hpp = x265_interp_8tap_horiz_pp_64x16_avx2;
-
-        p.pu[LUMA_48x64].luma_hpp = x265_interp_8tap_horiz_pp_48x64_avx2;
-
-#if X86_64
         ALL_LUMA_TU_S(dct, dct, avx2);
         ALL_LUMA_TU_S(idct, idct, avx2);
         ALL_LUMA_CU_S(transpose, transpose, avx2);
@@ -1571,15 +1526,36 @@ void setupAssemblyPrimitives(EncoderPrim
         ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);
         ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);
 
+        // missing 4x8, 4x16, 24x32, 12x16 for the fill set of luma PU
+        p.pu[LUMA_4x4].luma_hpp = x265_interp_8tap_horiz_pp_4x4_avx2;
+        p.pu[LUMA_8x4].luma_hpp = x265_interp_8tap_horiz_pp_8x4_avx2;
+        p.pu[LUMA_8x8].luma_hpp = x265_interp_8tap_horiz_pp_8x8_avx2;
+        p.pu[LUMA_8x16].luma_hpp = x265_interp_8tap_horiz_pp_8x16_avx2;
+        p.pu[LUMA_8x32].luma_hpp = x265_interp_8tap_horiz_pp_8x32_avx2;
+        p.pu[LUMA_16x4].luma_hpp = x265_interp_8tap_horiz_pp_16x4_avx2;
+        p.pu[LUMA_16x8].luma_hpp = x265_interp_8tap_horiz_pp_16x8_avx2;
+        p.pu[LUMA_16x12].luma_hpp = x265_interp_8tap_horiz_pp_16x12_avx2;
+        p.pu[LUMA_16x16].luma_hpp = x265_interp_8tap_horiz_pp_16x16_avx2;
+        p.pu[LUMA_16x32].luma_hpp = x265_interp_8tap_horiz_pp_16x32_avx2;
+        p.pu[LUMA_16x64].luma_hpp = x265_interp_8tap_horiz_pp_16x64_avx2;
+        p.pu[LUMA_32x8].luma_hpp  = x265_interp_8tap_horiz_pp_32x8_avx2;
+        p.pu[LUMA_32x16].luma_hpp = x265_interp_8tap_horiz_pp_32x16_avx2;
+        p.pu[LUMA_32x24].luma_hpp = x265_interp_8tap_horiz_pp_32x24_avx2;
+        p.pu[LUMA_32x32].luma_hpp = x265_interp_8tap_horiz_pp_32x32_avx2;
+        p.pu[LUMA_32x64].luma_hpp = x265_interp_8tap_horiz_pp_32x64_avx2;
+        p.pu[LUMA_64x64].luma_hpp = x265_interp_8tap_horiz_pp_64x64_avx2;
+        p.pu[LUMA_64x48].luma_hpp = x265_interp_8tap_horiz_pp_64x48_avx2;
+        p.pu[LUMA_64x32].luma_hpp = x265_interp_8tap_horiz_pp_64x32_avx2;
+        p.pu[LUMA_64x16].luma_hpp = x265_interp_8tap_horiz_pp_64x16_avx2;
+        p.pu[LUMA_48x64].luma_hpp = x265_interp_8tap_horiz_pp_48x64_avx2;
+
         p.pu[LUMA_4x4].luma_hps = x265_interp_8tap_horiz_ps_4x4_avx2;
         p.pu[LUMA_4x8].luma_hps = x265_interp_8tap_horiz_ps_4x8_avx2;
         p.pu[LUMA_4x16].luma_hps = x265_interp_8tap_horiz_ps_4x16_avx2;
-
         p.pu[LUMA_8x4].luma_hps = x265_interp_8tap_horiz_ps_8x4_avx2;
         p.pu[LUMA_8x8].luma_hps = x265_interp_8tap_horiz_ps_8x8_avx2;
         p.pu[LUMA_8x16].luma_hps = x265_interp_8tap_horiz_ps_8x16_avx2;
         p.pu[LUMA_8x32].luma_hps = x265_interp_8tap_horiz_ps_8x32_avx2;
-
         p.pu[LUMA_16x8].luma_hps = x265_interp_8tap_horiz_ps_16x8_avx2;
         p.pu[LUMA_16x16].luma_hps = x265_interp_8tap_horiz_ps_16x16_avx2;
         p.pu[LUMA_16x12].luma_hps = x265_interp_8tap_horiz_ps_16x12_avx2;
@@ -1587,67 +1563,50 @@ void setupAssemblyPrimitives(EncoderPrim
         p.pu[LUMA_16x32].luma_hps = x265_interp_8tap_horiz_ps_16x32_avx2;
         p.pu[LUMA_16x64].luma_hps = x265_interp_8tap_horiz_ps_16x64_avx2;
 
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vpp = x265_interp_4tap_vert_pp_16x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vpp = x265_interp_4tap_vert_pp_32x32_avx2;
-
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vps = x265_interp_4tap_vert_ps_16x16_avx2;
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vps = x265_interp_4tap_vert_ps_32x32_avx2;
-#else
-        /* functions with both 64-bit and 32-bit implementations */
-        p.cu[BLOCK_4x4].dct = x265_dct4_avx2;
-
-        p.pu[LUMA_4x4].luma_vps = x265_interp_8tap_vert_ps_4x4_avx2;
-        p.pu[LUMA_4x4].luma_vpp = x265_interp_8tap_vert_pp_4x4_avx2;
-        p.pu[LUMA_8x4].luma_vpp = x265_interp_8tap_vert_pp_8x4_avx2;
-        p.pu[LUMA_8x8].luma_vpp = x265_interp_8tap_vert_pp_8x8_avx2;
-        p.pu[LUMA_8x16].luma_vpp = x265_interp_8tap_vert_pp_8x16_avx2;
-        p.pu[LUMA_8x32].luma_vpp = x265_interp_8tap_vert_pp_8x32_avx2;
-
-        p.pu[LUMA_8x4].luma_vps = x265_interp_8tap_vert_ps_8x4_avx2;
-        p.pu[LUMA_8x8].luma_vps = x265_interp_8tap_vert_ps_8x8_avx2;
-        p.pu[LUMA_8x16].luma_vps = x265_interp_8tap_vert_ps_8x16_avx2;
-        p.pu[LUMA_8x32].luma_vps = x265_interp_8tap_vert_ps_8x32_avx2;
-
-        p.pu[LUMA_4x4].luma_vsp = x265_interp_8tap_vert_sp_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vsp = x265_interp_8tap_vert_sp_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vsp = x265_interp_8tap_vert_sp_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vsp = x265_interp_8tap_vert_sp_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vsp = x265_interp_8tap_vert_sp_16x4_avx2;
-
-        p.pu[LUMA_4x4].luma_vss = x265_interp_8tap_vert_ss_4x4_avx2;
-        p.pu[LUMA_4x8].luma_vss = x265_interp_8tap_vert_ss_4x8_avx2;
-        p.pu[LUMA_4x16].luma_vss = x265_interp_8tap_vert_ss_4x16_avx2;
-        p.pu[LUMA_8x4].luma_vss = x265_interp_8tap_vert_ss_8x4_avx2;
-        p.pu[LUMA_16x4].luma_vss = x265_interp_8tap_vert_ss_16x4_avx2;
-#endif
-
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_hpp = x265_interp_4tap_horiz_pp_8x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hpp = x265_interp_4tap_horiz_pp_4x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
 
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
+
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vpp = x265_interp_4tap_vert_pp_4x16_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vpp = x265_interp_4tap_vert_pp_8x8_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_vpp = x265_interp_4tap_vert_pp_2x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vpp = x265_interp_4tap_vert_pp_4x2_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vpp = x265_interp_4tap_vert_pp_8x2_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vpp = x265_interp_4tap_vert_pp_8x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vpp = x265_interp_4tap_vert_pp_8x6_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].filter_vpp = x265_interp_4tap_vert_pp_16x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vpp = x265_interp_4tap_vert_pp_16x16_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_vpp = x265_interp_4tap_vert_pp_32x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vpp = x265_interp_4tap_vert_pp_32x16_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].filter_vpp = x265_interp_4tap_vert_pp_32x24_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vpp = x265_interp_4tap_vert_pp_32x32_avx2;
 
         p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_vps = x265_interp_4tap_vert_ps_2x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vps = x265_interp_4tap_vert_ps_4x2_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_avx2;
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_vps = x265_interp_4tap_vert_ps_8x2_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_vps = x265_interp_4tap_vert_ps_8x4_avx2;
         p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vps = x265_interp_4tap_vert_ps_8x6_avx2;


More information about the x265-commits mailing list