[x265-commits] [x265] api: clarify docs and use of x265_api_get()
Deepthi Nandakumar
deepthi at multicorewareinc.com
Thu Apr 30 22:37:02 CEST 2015
details: http://hg.videolan.org/x265/rev/a3ba8c92dcea
branches:
changeset: 10329:a3ba8c92dcea
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Apr 30 09:44:07 2015 +0530
description:
api: clarify docs and use of x265_api_get()
Subject: [x265] doc: replace sublayer with enhancement layer
details: http://hg.videolan.org/x265/rev/2a1dd8a1b324
branches:
changeset: 10330:2a1dd8a1b324
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Apr 30 13:27:12 2015 +0530
description:
doc: replace sublayer with enhancement layer
Subject: [x265] asm: chroma_hpp[48x64] for i444 - improved 17498c->13381c
details: http://hg.videolan.org/x265/rev/5c9b9856de29
branches:
changeset: 10331:5c9b9856de29
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Apr 29 17:11:03 2015 +0530
description:
asm: chroma_hpp[48x64] for i444 - improved 17498c->13381c
Subject: [x265] convert sigCtx table from [4][4] to [16]
details: http://hg.videolan.org/x265/rev/60a66c581d67
branches:
changeset: 10332:60a66c581d67
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:48:46 2015 +0800
description:
convert sigCtx table from [4][4] to [16]
Subject: [x265] pre-compute abs coeff and simplify scan table
details: http://hg.videolan.org/x265/rev/e6f14a4b35ed
branches:
changeset: 10333:e6f14a4b35ed
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:48:49 2015 +0800
description:
pre-compute abs coeff and simplify scan table
Subject: [x265] remove reduce check on firstC2FlagIdx
details: http://hg.videolan.org/x265/rev/f406b2e6262e
branches:
changeset: 10334:f406b2e6262e
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:48:53 2015 +0800
description:
remove reduce check on firstC2FlagIdx
Subject: [x265] fast RD path on encode coeff remain code in codeCoeffNxN()
details: http://hg.videolan.org/x265/rev/2158765e992f
branches:
changeset: 10335:2158765e992f
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:48:57 2015 +0800
description:
fast RD path on encode coeff remain code in codeCoeffNxN()
Subject: [x265] improve compute on baseLevel by 2-bits encode code
details: http://hg.videolan.org/x265/rev/84b6da2f3da0
branches:
changeset: 10336:84b6da2f3da0
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:49:01 2015 +0800
description:
improve compute on baseLevel by 2-bits encode code
Subject: [x265] simplify compute on get codeNumber length
details: http://hg.videolan.org/x265/rev/73a3bfc8c2a2
branches:
changeset: 10337:73a3bfc8c2a2
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:49:05 2015 +0800
description:
simplify compute on get codeNumber length
Subject: [x265] faster clip operator on goRiceParam
details: http://hg.videolan.org/x265/rev/432f2e3df326
branches:
changeset: 10338:432f2e3df326
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:49:09 2015 +0800
description:
faster clip operator on goRiceParam
Subject: [x265] simplify logic on get coeff remain cost in codeCoeffNxN()
details: http://hg.videolan.org/x265/rev/d774ef13d9a5
branches:
changeset: 10339:d774ef13d9a5
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 18:49:14 2015 +0800
description:
simplify logic on get coeff remain cost in codeCoeffNxN()
Subject: [x265] fix check failure in Entropy::writeCoefRemainExGolomb()
details: http://hg.videolan.org/x265/rev/554a5c9b1646
branches:
changeset: 10340:554a5c9b1646
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 19:52:30 2015 +0800
description:
fix check failure in Entropy::writeCoefRemainExGolomb()
Subject: [x265] asm: downgrade x265_interp_8tap_hv_pp_8x8 from SSE4 to SSSE3
details: http://hg.videolan.org/x265/rev/e7aba11a3bbc
branches:
changeset: 10341:e7aba11a3bbc
user: Min Chen <chenm003 at 163.com>
date: Thu Apr 30 19:52:36 2015 +0800
description:
asm: downgrade x265_interp_8tap_hv_pp_8x8 from SSE4 to SSSE3
Subject: [x265] asm: filter_vpp, filter_vps for 12x32 in avx2
details: http://hg.videolan.org/x265/rev/ba4f1516cea2
branches:
changeset: 10342:ba4f1516cea2
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Apr 30 15:16:19 2015 +0530
description:
asm: filter_vpp, filter_vps for 12x32 in avx2
filter_vpp[12x32]: 2307c->1885c
filter_vps[12x32]: 1884c->1612c
Subject: [x265] asm: filter_vpp, filter_vps for 8x12 in avx2
details: http://hg.videolan.org/x265/rev/0562f6ae98f1
branches:
changeset: 10343:0562f6ae98f1
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Apr 30 15:31:33 2015 +0530
description:
asm: filter_vpp, filter_vps for 8x12 in avx2
filter_vpp[8x12]: 425c->388c
filter_vps[8x12]: 458c->388c
Subject: [x265] asm: filter_vpp, filter_vps for 2x4 in avx2
details: http://hg.videolan.org/x265/rev/21b710bafb92
branches:
changeset: 10344:21b710bafb92
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Apr 30 18:08:03 2015 +0530
description:
asm: filter_vpp, filter_vps for 2x4 in avx2
Subject: [x265] search: cleanup checkBestMVP(), no behavior change
details: http://hg.videolan.org/x265/rev/2b3275b1eb85
branches:
changeset: 10345:2b3275b1eb85
user: Steve Borho <steve at borho.org>
date: Wed Apr 29 12:36:21 2015 -0500
description:
search: cleanup checkBestMVP(), no behavior change
Subject: [x265] search: introduce selectMVP helper method
details: http://hg.videolan.org/x265/rev/acf4ede2ca53
branches:
changeset: 10346:acf4ede2ca53
user: Steve Borho <steve at borho.org>
date: Wed Apr 29 14:40:48 2015 -0500
description:
search: introduce selectMVP helper method
Subject: [x265] search: do not clip MVP in setSearchRange()
details: http://hg.videolan.org/x265/rev/5f89a0776b96
branches:
changeset: 10347:5f89a0776b96
user: Steve Borho <steve at borho.org>
date: Wed Apr 29 14:50:26 2015 -0500
description:
search: do not clip MVP in setSearchRange()
The MVP itself should not be clipped, since this will make MVD calculations
incorrect. Motion estimation is always careful to clip all motion vectors to
within the available pixel range (mvmin/mvmax) during the search, so it is safe
for the MVP to be out of range.
Subject: [x265] search: allow AMP to use motion estimation for 64x64 CUs
details: http://hg.videolan.org/x265/rev/bca33880585a
branches:
changeset: 10348:bca33880585a
user: Steve Borho <steve at borho.org>
date: Sat Apr 25 00:41:25 2015 -0500
description:
search: allow AMP to use motion estimation for 64x64 CUs
This was a hold-over from the HM which never wanted to perform motion searches
for AMP PUs for 64x64 CUs. Presumably because they were never optimized.
Because of the way the rd-levels were developed, RD levels 0..4 always
hard-coded bMergeOnly to false, but to compensate they never attempted AMP
modes at 64x64 CUs.
This patch makes AMP partitions always perform motion estimation, regardless of
CU size and RD level, and it removes the bMergeOnly argument to predInterSearch.
It should give a small improvement to compression efficiency at slower presets
for a minimal performance cost (since 64x64 inter analysis is relatively rare).
diffstat:
doc/reST/api.rst | 26 ++-
doc/reST/cli.rst | 4 +-
source/common/x86/asm-primitives.cpp | 12 +-
source/common/x86/ipfilter8.asm | 252 +++++++++++++++++++++++++++++++++-
source/common/x86/ipfilter8.h | 2 +-
source/encoder/analysis.cpp | 45 ++---
source/encoder/analysis.h | 2 +-
source/encoder/entropy.cpp | 175 +++++++++++++++---------
source/encoder/search.cpp | 171 ++++++++---------------
source/encoder/search.h | 7 +-
source/x265.cpp | 4 +-
11 files changed, 468 insertions(+), 232 deletions(-)
diffs (truncated from 1231 to 300 lines):
diff -r 74d7fe7a81ad -r bca33880585a doc/reST/api.rst
--- a/doc/reST/api.rst Wed Apr 29 11:08:44 2015 -0500
+++ b/doc/reST/api.rst Sat Apr 25 00:41:25 2015 -0500
@@ -352,7 +352,7 @@ CTU size::
Multi-library Interface
=======================
-If your application might want to make a runtime selection between among
+If your application might want to make a runtime selection between
a number of libx265 libraries (perhaps 8bpp and 16bpp), then you will
want to use the multi-library interface.
@@ -370,16 +370,20 @@ without the **x265_** prefix. So **x265_
* libx265 */
const x265_api* x265_api_get(int bitDepth);
-The general idea is to request the API for the bitDepth you would prefer
-the encoder to use (8 or 10), and if that returns NULL you request the
-API for bitDepth=0, which returns the system default libx265.
+Note that using this multi-library API in your application is only the
+first step.
-Note that using this multi-library API in your application is only the
-first step. Your application must link to one build of libx265
-(statically or dynamically) and this linked version of libx265 will
-support one bit-depth (8 or 10 bits). If you request a different
-bit-depth, the linked libx265 will attempt to dynamically bind a shared
-library libx265 with a name appropriate for the requested bit-depth:
+Your application must link to one build of libx265 (statically or
+dynamically) and this linked version of libx265 will support one
+bit-depth (8 or 10 bits).
+
+Your application must now request the API for the bitDepth you would
+prefer the encoder to use (8 or 10). If the requested bitdepth is zero,
+or if it matches the bitdepth of the system default libx265 (the
+currently linked library), then this library will be used for encode.
+If you request a different bit-depth, the linked libx265 will attempt
+to dynamically bind a shared library with a name appropriate for the
+requested bit-depth:
8-bit: libx265_main.dll
10-bit: libx265_main10.dll
@@ -390,7 +394,7 @@ library libx265 with a name appropriate
For example on Windows, one could package together an x265.exe
statically linked against the 8bpp libx265 together with a
libx265_main10.dll in the same folder, and this executable would be able
-to encode 10bit bitstreams by specifying -P main10 on the command line.
+to encode main and main10 bitstreams.
On Linux, x265 packagers could install 8bpp static and shared libraries
under the name libx265 (so all applications link against 8bpp libx265)
diff -r 74d7fe7a81ad -r bca33880585a doc/reST/cli.rst
--- a/doc/reST/cli.rst Wed Apr 29 11:08:44 2015 -0500
+++ b/doc/reST/cli.rst Sat Apr 25 00:41:25 2015 -0500
@@ -1559,8 +1559,8 @@ Bitstream options
Enable a temporal sub layer. All referenced I/P/B frames are in the
base layer and all unreferenced B frames are placed in a temporal
- sublayer. A decoder may chose to drop the sublayer and only decode
- and display the base layer slices.
+ enhancement layer. A decoder may chose to drop the enhancement layer
+ and only decode and display the base layer slices.
If used with a fixed GOP (:option:`b-adapt` 0) and :option:`bframes`
3 then the two layers evenly split the frame rate, with a cadence of
diff -r 74d7fe7a81ad -r bca33880585a source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Wed Apr 29 11:08:44 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Sat Apr 25 00:41:25 2015 -0500
@@ -1447,6 +1447,9 @@ void setupAssemblyPrimitives(EncoderPrim
ALL_LUMA_TU(count_nonzero, count_nonzero, ssse3);
+ // MUST be done after LUMA_FILTERS() to overwrite default version
+ p.pu[LUMA_8x8].luma_hvpp = x265_interp_8tap_hv_pp_8x8_ssse3;
+
p.frameInitLowres = x265_frame_init_lowres_core_ssse3;
p.scale1D_128to64 = x265_scale1D_128to64_ssse3;
p.scale2D_64to32 = x265_scale2D_64to32_ssse3;
@@ -1548,7 +1551,7 @@ void setupAssemblyPrimitives(EncoderPrim
CHROMA_444_VSP_FILTERS_SSE4(_sse4);
// MUST be done after LUMA_FILTERS() to overwrite default version
- p.pu[LUMA_8x8].luma_hvpp = x265_interp_8tap_hv_pp_8x8_sse4;
+ p.pu[LUMA_8x8].luma_hvpp = x265_interp_8tap_hv_pp_8x8_ssse3;
LUMA_CU_BLOCKCOPY(ps, sse4);
CHROMA_420_CU_BLOCKCOPY(ps, sse4);
@@ -2408,6 +2411,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_hpp = x265_interp_4tap_horiz_pp_64x32_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_hpp = x265_interp_4tap_horiz_pp_64x48_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_hpp = x265_interp_4tap_horiz_pp_64x16_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_hpp = x265_interp_4tap_horiz_pp_48x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
@@ -2536,6 +2540,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vps = x265_interp_4tap_vert_ps_8x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_vps = x265_interp_4tap_vert_ps_32x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vps = x265_interp_4tap_vert_ps_32x48_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vps = x265_interp_4tap_vert_ps_12x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vps = x265_interp_4tap_vert_ps_8x12_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x4].filter_vps = x265_interp_4tap_vert_ps_2x4_avx2;
//i444 for chroma_vps
p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_avx2;
@@ -2577,6 +2584,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].filter_vpp = x265_interp_4tap_vert_pp_8x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].filter_vpp = x265_interp_4tap_vert_pp_32x64_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vpp = x265_interp_4tap_vert_pp_32x48_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vpp = x265_interp_4tap_vert_pp_12x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vpp = x265_interp_4tap_vert_pp_8x12_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x4].filter_vpp = x265_interp_4tap_vert_pp_2x4_avx2;
//i444 for chroma_vpp
p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
diff -r 74d7fe7a81ad -r bca33880585a source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm Wed Apr 29 11:08:44 2015 -0500
+++ b/source/common/x86/ipfilter8.asm Sat Apr 25 00:41:25 2015 -0500
@@ -3157,7 +3157,7 @@ cglobal interp_8tap_horiz_pp_%1x%2, 4,6,
;-----------------------------------------------------------------------------
; void interp_8tap_hv_pp_%1x%2(pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int idxX, int idxY)
;-----------------------------------------------------------------------------
-INIT_XMM sse4
+INIT_XMM ssse3
cglobal interp_8tap_hv_pp_8x8, 4, 7, 8, 0-15*16
%define coef m7
%define stk_buf rsp
@@ -5556,6 +5556,148 @@ cglobal interp_4tap_vert_%1_8x16, 4, 7,
FILTER_VER_CHROMA_AVX2_8x16 pp
FILTER_VER_CHROMA_AVX2_8x16 ps
+%macro FILTER_VER_CHROMA_AVX2_8x12 1
+INIT_YMM avx2
+cglobal interp_4tap_vert_%1_8x12, 4, 7, 8
+ mov r4d, r4m
+ shl r4d, 6
+
+%ifdef PIC
+ lea r5, [tab_ChromaCoeffVer_32]
+ add r5, r4
+%else
+ lea r5, [tab_ChromaCoeffVer_32 + r4]
+%endif
+
+ lea r4, [r1 * 3]
+ sub r0, r1
+%ifidn %1, pp
+ mova m7, [pw_512]
+%else
+ add r3d, r3d
+ mova m7, [pw_2000]
+%endif
+ lea r6, [r3 * 3]
+ movq xm1, [r0] ; m1 = row 0
+ movq xm2, [r0 + r1] ; m2 = row 1
+ punpcklbw xm1, xm2
+ movq xm3, [r0 + r1 * 2] ; m3 = row 2
+ punpcklbw xm2, xm3
+ vinserti128 m5, m1, xm2, 1
+ pmaddubsw m5, [r5]
+ movq xm4, [r0 + r4] ; m4 = row 3
+ punpcklbw xm3, xm4
+ lea r0, [r0 + r1 * 4]
+ movq xm1, [r0] ; m1 = row 4
+ punpcklbw xm4, xm1
+ vinserti128 m2, m3, xm4, 1
+ pmaddubsw m0, m2, [r5 + 1 * mmsize]
+ paddw m5, m0
+ pmaddubsw m2, [r5]
+ movq xm3, [r0 + r1] ; m3 = row 5
+ punpcklbw xm1, xm3
+ movq xm4, [r0 + r1 * 2] ; m4 = row 6
+ punpcklbw xm3, xm4
+ vinserti128 m1, m1, xm3, 1
+ pmaddubsw m0, m1, [r5 + 1 * mmsize]
+ paddw m2, m0
+ pmaddubsw m1, [r5]
+ movq xm3, [r0 + r4] ; m3 = row 7
+ punpcklbw xm4, xm3
+ lea r0, [r0 + r1 * 4]
+ movq xm0, [r0] ; m0 = row 8
+ punpcklbw xm3, xm0
+ vinserti128 m4, m4, xm3, 1
+ pmaddubsw m3, m4, [r5 + 1 * mmsize]
+ paddw m1, m3
+ pmaddubsw m4, [r5]
+ movq xm3, [r0 + r1] ; m3 = row 9
+ punpcklbw xm0, xm3
+ movq xm6, [r0 + r1 * 2] ; m6 = row 10
+ punpcklbw xm3, xm6
+ vinserti128 m0, m0, xm3, 1
+ pmaddubsw m3, m0, [r5 + 1 * mmsize]
+ paddw m4, m3
+ pmaddubsw m0, [r5]
+%ifidn %1, pp
+ pmulhrsw m5, m7 ; m5 = word: row 0, row 1
+ pmulhrsw m2, m7 ; m2 = word: row 2, row 3
+ pmulhrsw m1, m7 ; m1 = word: row 4, row 5
+ pmulhrsw m4, m7 ; m4 = word: row 6, row 7
+ packuswb m5, m2
+ packuswb m1, m4
+ vextracti128 xm2, m5, 1
+ vextracti128 xm4, m1, 1
+ movq [r2], xm5
+ movq [r2 + r3], xm2
+ movhps [r2 + r3 * 2], xm5
+ movhps [r2 + r6], xm2
+ lea r2, [r2 + r3 * 4]
+ movq [r2], xm1
+ movq [r2 + r3], xm4
+ movhps [r2 + r3 * 2], xm1
+ movhps [r2 + r6], xm4
+%else
+ psubw m5, m7 ; m5 = word: row 0, row 1
+ psubw m2, m7 ; m2 = word: row 2, row 3
+ psubw m1, m7 ; m1 = word: row 4, row 5
+ psubw m4, m7 ; m4 = word: row 6, row 7
+ vextracti128 xm3, m5, 1
+ movu [r2], xm5
+ movu [r2 + r3], xm3
+ vextracti128 xm3, m2, 1
+ movu [r2 + r3 * 2], xm2
+ movu [r2 + r6], xm3
+ lea r2, [r2 + r3 * 4]
+ vextracti128 xm5, m1, 1
+ vextracti128 xm3, m4, 1
+ movu [r2], xm1
+ movu [r2 + r3], xm5
+ movu [r2 + r3 * 2], xm4
+ movu [r2 + r6], xm3
+%endif
+ movq xm3, [r0 + r4] ; m3 = row 11
+ punpcklbw xm6, xm3
+ lea r0, [r0 + r1 * 4]
+ movq xm5, [r0] ; m5 = row 12
+ punpcklbw xm3, xm5
+ vinserti128 m6, m6, xm3, 1
+ pmaddubsw m3, m6, [r5 + 1 * mmsize]
+ paddw m0, m3
+ pmaddubsw m6, [r5]
+ movq xm3, [r0 + r1] ; m3 = row 13
+ punpcklbw xm5, xm3
+ movq xm2, [r0 + r1 * 2] ; m2 = row 14
+ punpcklbw xm3, xm2
+ vinserti128 m5, m5, xm3, 1
+ pmaddubsw m3, m5, [r5 + 1 * mmsize]
+ paddw m6, m3
+ lea r2, [r2 + r3 * 4]
+%ifidn %1, pp
+ pmulhrsw m0, m7 ; m0 = word: row 8, row 9
+ pmulhrsw m6, m7 ; m6 = word: row 10, row 11
+ packuswb m0, m6
+ vextracti128 xm6, m0, 1
+ movq [r2], xm0
+ movq [r2 + r3], xm6
+ movhps [r2 + r3 * 2], xm0
+ movhps [r2 + r6], xm6
+%else
+ psubw m0, m7 ; m0 = word: row 8, row 9
+ psubw m6, m7 ; m6 = word: row 10, row 11
+ vextracti128 xm1, m0, 1
+ vextracti128 xm3, m6, 1
+ movu [r2], xm0
+ movu [r2 + r3], xm1
+ movu [r2 + r3 * 2], xm6
+ movu [r2 + r6], xm3
+%endif
+ RET
+%endmacro
+
+ FILTER_VER_CHROMA_AVX2_8x12 pp
+ FILTER_VER_CHROMA_AVX2_8x12 ps
+
%macro FILTER_VER_CHROMA_AVX2_8xN 2
INIT_YMM avx2
cglobal interp_4tap_vert_%1_8x%2, 4, 7, 8
@@ -7560,9 +7702,9 @@ cglobal interp_4tap_vert_%1_16x4, 4, 6,
FILTER_VER_CHROMA_AVX2_16x4 pp
FILTER_VER_CHROMA_AVX2_16x4 ps
-%macro FILTER_VER_CHROMA_AVX2_12x16 1
-INIT_YMM avx2
-cglobal interp_4tap_vert_%1_12x16, 4, 7, 8
+%macro FILTER_VER_CHROMA_AVX2_12xN 2
+INIT_YMM avx2
+cglobal interp_4tap_vert_%1_12x%2, 4, 7, 8
mov r4d, r4m
shl r4d, 6
@@ -7582,7 +7724,7 @@ cglobal interp_4tap_vert_%1_12x16, 4, 7,
vbroadcasti128 m7, [pw_2000]
%endif
lea r6, [r3 * 3]
-
+%rep %2 / 16
movu xm0, [r0] ; m0 = row 0
movu xm1, [r0 + r1] ; m1 = row 1
punpckhbw xm2, xm0, xm1
@@ -7868,11 +8010,15 @@ cglobal interp_4tap_vert_%1_12x16, 4, 7,
vextracti128 xm5, m5, 1
More information about the x265-commits
mailing list