[x265-commits] [x265] asm: intra_pred_ang4_2_sse2
David T Yuen
dtyx265 at gmail.com
Tue Mar 24 22:23:47 CET 2015
details: http://hg.videolan.org/x265/rev/6b8da2264523
branches:
changeset: 9864:6b8da2264523
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 12:26:38 2015 -0700
description:
asm: intra_pred_ang4_2_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 2\]"
intra_ang_4x4[ 2] 8.86x 134.98 1195.68
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 2\]"
intra_ang_4x4[ 2] 9.23x 222.48 2053.30
Subject: [x265] asm: intra_pred_ang4_3_sse2
details: http://hg.videolan.org/x265/rev/1ad3d2d854a4
branches:
changeset: 9865:1ad3d2d854a4
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 12:35:33 2015 -0700
description:
asm: intra_pred_ang4_3_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 3\]"
intra_ang_4x4[ 3] 2.58x 704.98 1818.77
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 3\]"
intra_ang_4x4[ 3] 3.68x 757.49 2784.21
Subject: [x265] asm: intra_pred_ang4_4_sse2
details: http://hg.videolan.org/x265/rev/e91b92457670
branches:
changeset: 9866:e91b92457670
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 12:53:22 2015 -0700
description:
asm: intra_pred_ang4_4_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 4\]"
intra_ang_4x4[ 4] 2.74x 709.98 1947.60
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 4\]"
intra_ang_4x4[ 4] 3.97x 747.49 2970.13
Subject: [x265] asm: intra_pred_ang4_5_sse2
details: http://hg.videolan.org/x265/rev/02bc460262b4
branches:
changeset: 9867:02bc460262b4
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 12:57:42 2015 -0700
description:
asm: intra_pred_ang4_5_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 5\]"
intra_ang_4x4[ 5] 2.94x 684.47 2014.99
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 5\]"
intra_ang_4x4[ 5] 3.82x 747.48 2854.97
Subject: [x265] asm: intra_pred_ang4_6_sse2
details: http://hg.videolan.org/x265/rev/e043561425f9
branches:
changeset: 9868:e043561425f9
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 13:01:36 2015 -0700
description:
asm: intra_pred_ang4_6_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 6\]"
intra_ang_4x4[ 6] 2.92x 655.00 1914.97
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 6\]"
intra_ang_4x4[ 6] 3.96x 717.58 2844.93
Subject: [x265] asm: intra_pred_ang4_7_sse2
details: http://hg.videolan.org/x265/rev/3daa8229d676
branches:
changeset: 9869:3daa8229d676
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 13:18:33 2015 -0700
description:
asm: intra_pred_ang4_7_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 7\]"
intra_ang_4x4[ 7] 2.77x 655.00 1817.47
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 7\]"
intra_ang_4x4[ 7] 3.56x 762.50 2714.98
Subject: [x265] asm: intra_pred_ang4_8_sse2
details: http://hg.videolan.org/x265/rev/71636c334b57
branches:
changeset: 9870:71636c334b57
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 13:22:23 2015 -0700
description:
asm: intra_pred_ang4_8_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 8\]"
intra_ang_4x4[ 8] 3.04x 640.00 1942.47
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 8\]"
intra_ang_4x4[ 8] 3.97x 722.50 2864.98
Subject: [x265] asm: intra_pred_ang4_9_sse2
details: http://hg.videolan.org/x265/rev/1b69c3a7bbd5
branches:
changeset: 9871:1b69c3a7bbd5
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Mar 23 13:28:35 2015 -0700
description:
asm: intra_pred_ang4_9_sse2
This is backported from sse4 code and replaces c code.
64-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 9\]"
intra_ang_4x4[ 9] 2.97x 645.00 1917.47
32-bit
./test/TestBench --testbench intrapred | grep "intra_ang_4x4\[ 9\]"
intra_ang_4x4[ 9] 4.03x 722.50 2910.00
Subject: [x265] asm: avx2 code for ssd_s[16x16] for 8bpp
details: http://hg.videolan.org/x265/rev/cd68e557eb7b
branches:
changeset: 9872:cd68e557eb7b
user: Sumalatha Polureddy
date: Tue Mar 24 10:25:06 2015 +0530
description:
asm: avx2 code for ssd_s[16x16] for 8bpp
see3
ssd_s[16x16] 6.33x 345.70 2188.47
avx2
ssd_s[16x16] 9.86x 221.34 2183.05
Subject: [x265] asm: psyCost_pp avx2 code for BLOCK(8x8,16x16,32x32,64x64)
details: http://hg.videolan.org/x265/rev/9eefa3feecdb
branches:
changeset: 9873:9eefa3feecdb
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Mon Mar 23 14:10:52 2015 +0530
description:
asm: psyCost_pp avx2 code for BLOCK(8x8,16x16,32x32,64x64)
AVX2:
psy_cost_pp[8x8] 12.28x 611.76 7511.84
psy_cost_pp[16x16] 13.43x 2253.78 30262.36
psy_cost_pp[32x32] 14.16x 8578.93 121519.92
psy_cost_pp[64x64] 12.37x 39645.38 490279.69
SSE4:
psy_cost_pp[8x8] 8.40x 930.68 7818.93
psy_cost_pp[16x16] 8.57x 3648.62 31282.65
psy_cost_pp[32x32] 8.73x 13969.57 121993.38
psy_cost_pp[64x64] 8.74x 54604.69 477252.69
Subject: [x265] asm: psyCost_pp avx2 code for BLOCK_4x4
details: http://hg.videolan.org/x265/rev/48fee0fa4814
branches:
changeset: 9874:48fee0fa4814
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Mon Mar 23 20:20:02 2015 +0530
description:
asm: psyCost_pp avx2 code for BLOCK_4x4
AVX2:
psy_cost_pp[4x4] 10.30x 216.56 2230.77
SSE4:
psy_cost_pp[4x4] 6.53x 352.01 2297.35
Subject: [x265] analysis: only perform checks if merge mode was selected
details: http://hg.videolan.org/x265/rev/27717be056d3
branches:
changeset: 9875:27717be056d3
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 14:26:24 2015 -0500
description:
analysis: only perform checks if merge mode was selected
Subject: [x265] param: bframes can match lookaheadDepth if both are zero (fixes #118)
details: http://hg.videolan.org/x265/rev/a962bb577a47
branches:
changeset: 9876:a962bb577a47
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 15:10:46 2015 -0500
description:
param: bframes can match lookaheadDepth if both are zero (fixes #118)
Subject: [x265] slicetype: fix crash when lookaheadDepth is 0
details: http://hg.videolan.org/x265/rev/c7740b6cec26
branches:
changeset: 9877:c7740b6cec26
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 15:30:53 2015 -0500
description:
slicetype: fix crash when lookaheadDepth is 0
Subject: [x265] slicetype: spleling
details: http://hg.videolan.org/x265/rev/e637273e2ae6
branches:
changeset: 9878:e637273e2ae6
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 15:31:05 2015 -0500
description:
slicetype: spleling
diffstat:
source/common/param.cpp | 2 +-
source/common/x86/asm-primitives.cpp | 17 +
source/common/x86/intrapred.h | 9 +
source/common/x86/intrapred8.asm | 244 +++++++++++++++++++++++++
source/common/x86/pixel-a.asm | 332 ++++++++++++++++++++++++++++++++++-
source/common/x86/pixel.h | 7 +
source/common/x86/ssd-a.asm | 29 +++
source/encoder/analysis.cpp | 4 +-
source/encoder/slicetype.cpp | 4 +-
9 files changed, 642 insertions(+), 6 deletions(-)
diffs (truncated from 783 to 300 lines):
diff -r 7b66c36ed9ef -r e637273e2ae6 source/common/param.cpp
--- a/source/common/param.cpp Mon Mar 23 19:55:02 2015 -0500
+++ b/source/common/param.cpp Tue Mar 24 15:31:05 2015 -0500
@@ -1055,7 +1055,7 @@ int x265_check_params(x265_param* param)
"RD Level is out of range");
CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
"RDOQ Level is out of range");
- CHECK(param->bframes >= param->lookaheadDepth && !param->rc.bStatRead,
+ CHECK(param->bframes && param->bframes >= param->lookaheadDepth && !param->rc.bStatRead,
"Lookahead depth must be greater than the max consecutive bframe count");
CHECK(param->bframes < 0,
"bframe count should be greater than zero");
diff -r 7b66c36ed9ef -r e637273e2ae6 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Mon Mar 23 19:55:02 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Tue Mar 24 15:31:05 2015 -0500
@@ -1196,6 +1196,15 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
p.cu[BLOCK_32x32].intra_pred[PLANAR_IDX] = x265_intra_pred_planar32_sse2;
+ p.cu[BLOCK_4x4].intra_pred[2] = x265_intra_pred_ang4_2_sse2;
+ p.cu[BLOCK_4x4].intra_pred[3] = x265_intra_pred_ang4_3_sse2;
+ p.cu[BLOCK_4x4].intra_pred[4] = x265_intra_pred_ang4_4_sse2;
+ p.cu[BLOCK_4x4].intra_pred[5] = x265_intra_pred_ang4_5_sse2;
+ p.cu[BLOCK_4x4].intra_pred[6] = x265_intra_pred_ang4_6_sse2;
+ p.cu[BLOCK_4x4].intra_pred[7] = x265_intra_pred_ang4_7_sse2;
+ p.cu[BLOCK_4x4].intra_pred[8] = x265_intra_pred_ang4_8_sse2;
+ p.cu[BLOCK_4x4].intra_pred[9] = x265_intra_pred_ang4_9_sse2;
+
p.cu[BLOCK_4x4].calcresidual = x265_getResidual4_sse2;
p.cu[BLOCK_8x8].calcresidual = x265_getResidual8_sse2;
@@ -1417,6 +1426,12 @@ void setupAssemblyPrimitives(EncoderPrim
#if X86_64
if (cpuMask & X265_CPU_AVX2)
{
+ p.cu[BLOCK_4x4].psy_cost_pp = x265_psyCost_pp_4x4_avx2;
+ p.cu[BLOCK_8x8].psy_cost_pp = x265_psyCost_pp_8x8_avx2;
+ p.cu[BLOCK_16x16].psy_cost_pp = x265_psyCost_pp_16x16_avx2;
+ p.cu[BLOCK_32x32].psy_cost_pp = x265_psyCost_pp_32x32_avx2;
+ p.cu[BLOCK_64x64].psy_cost_pp = x265_psyCost_pp_64x64_avx2;
+
p.pu[LUMA_8x4].addAvg = x265_addAvg_8x4_avx2;
p.pu[LUMA_8x8].addAvg = x265_addAvg_8x8_avx2;
p.pu[LUMA_8x16].addAvg = x265_addAvg_8x16_avx2;
@@ -1519,6 +1534,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_16x32].sad_x4 = x265_pixel_sad_x4_16x32_avx2;
p.cu[BLOCK_16x16].sse_pp = x265_pixel_ssd_16x16_avx2;
+
+ p.cu[BLOCK_16x16].ssd_s = x265_pixel_ssd_s_16_avx2;
p.cu[BLOCK_32x32].ssd_s = x265_pixel_ssd_s_32_avx2;
p.cu[BLOCK_8x8].copy_cnt = x265_copy_cnt_8_avx2;
diff -r 7b66c36ed9ef -r e637273e2ae6 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Mon Mar 23 19:55:02 2015 -0500
+++ b/source/common/x86/intrapred.h Tue Mar 24 15:31:05 2015 -0500
@@ -47,6 +47,15 @@ void x265_intra_pred_planar32_sse4(pixel
#define DECL_ANG(bsize, mode, cpu) \
void x265_intra_pred_ang ## bsize ## _ ## mode ## _ ## cpu(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+DECL_ANG(4, 2, sse2);
+DECL_ANG(4, 3, sse2);
+DECL_ANG(4, 4, sse2);
+DECL_ANG(4, 5, sse2);
+DECL_ANG(4, 6, sse2);
+DECL_ANG(4, 7, sse2);
+DECL_ANG(4, 8, sse2);
+DECL_ANG(4, 9, sse2);
+
DECL_ANG(4, 2, ssse3);
DECL_ANG(4, 3, sse4);
DECL_ANG(4, 4, sse4);
diff -r 7b66c36ed9ef -r e637273e2ae6 source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm Mon Mar 23 19:55:02 2015 -0500
+++ b/source/common/x86/intrapred8.asm Tue Mar 24 15:31:05 2015 -0500
@@ -267,6 +267,13 @@ const ang_table
%assign x x+1
%endrep
+const pw_ang_table
+%assign x 0
+%rep 32
+ times 4 dw (32-x), x
+%assign x x+1
+%endrep
+
SECTION .text
cextern pw_2
@@ -1109,6 +1116,243 @@ cglobal intra_pred_planar32, 3,3,8,0-(4*
%endif ; end ARCH_X86_32
+;-----------------------------------------------------------------------------------------
+; void intraPredAng4(pixel* dst, intptr_t dstStride, pixel* src, int dirMode, int bFilter)
+;-----------------------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal intra_pred_ang4_2, 3,5,3
+ lea r4, [r2 + 2]
+ add r2, 10
+ cmp r3m, byte 34
+ cmove r2, r4
+
+ movh m0, [r2]
+ movd [r0], m0
+ mova m1, m0
+ psrldq m1, 1
+ movd [r0 + r1], m1
+ mova m2, m0
+ psrldq m2, 2
+ movd [r0 + r1 * 2], m2
+ lea r1, [r1 * 3]
+ psrldq m0, 3
+ movd [r0 + r1], m0
+ RET
+
+INIT_XMM sse2
+cglobal intra_pred_ang4_3, 3,5,8
+ mov r4, 1
+ cmp r3m, byte 33
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ mova m1, m0
+ psrldq m1, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
+ mova m2, m0
+ psrldq m2, 4 ; [x x x x x x x x 7 6 6 5 5 4 4 3]
+ mova m3, m0
+ psrldq m3, 6 ; [x x x x x x x x 8 7 7 6 6 5 5 4]
+ punpcklqdq m0, m1
+ punpcklqdq m2, m3
+
+ lea r3, [pw_ang_table + 20 * 16]
+ mova m4, [r3 + 6 * 16] ; [26]
+ mova m5, [r3] ; [20]
+ mova m6, [r3 - 6 * 16] ; [14]
+ mova m7, [r3 - 12 * 16] ; [ 8]
+ jmp .do_filter4x4
+
+ ; NOTE: share path, input is m0=[1 0], m2=[3 2], m3,m4=coef, flag_z=no_transpose
+ALIGN 16
+.do_filter4x4:
+ pxor m1, m1
+ pxor m3, m3
+ punpckhbw m3, m0
+ psrlw m3, 8
+ pmaddwd m3, m5
+ punpcklbw m0, m1
+ pmaddwd m0, m4
+ packssdw m0, m3
+ paddw m0, [pw_16]
+ psraw m0, 5
+ pxor m3, m3
+ punpckhbw m3, m2
+ psrlw m3, 8
+ pmaddwd m3, m7
+ punpcklbw m2, m1
+ pmaddwd m2, m6
+ packssdw m2, m3
+ paddw m2, [pw_16]
+ psraw m2, 5
+
+ ; NOTE: mode 33 doesn't reorder, UNSAFE but I don't use any instruction that affect eflag register before
+ jz .store
+
+ ; transpose 4x4 c_trans_4x4 db 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15
+ pshufd m0, m0, 0xD8
+ pshufd m1, m2, 0xD8
+ pshuflw m0, m0, 0xD8
+ pshuflw m1, m1, 0xD8
+ pshufhw m0, m0, 0xD8
+ pshufhw m1, m1, 0xD8
+ mova m2, m0
+ punpckldq m0, m1
+ punpckhdq m2, m1
+
+.store:
+ packuswb m0, m2
+ movd [r0], m0
+ pshufd m0, m0, 0x39
+ movd [r0 + r1], m0
+ pshufd m0, m0, 0x39
+ movd [r0 + r1 * 2], m0
+ lea r1, [r1 * 3]
+ pshufd m0, m0, 0x39
+ movd [r0 + r1], m0
+ RET
+
+cglobal intra_pred_ang4_4, 3,5,8
+ xor r4, r4
+ inc r4
+ cmp r3m, byte 32
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ mova m1, m0
+ psrldq m1, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
+ mova m3, m0
+ psrldq m3, 4 ; [x x x x x x x x 7 6 6 5 5 4 4 3]
+ punpcklqdq m0, m1
+ punpcklqdq m2, m1, m3
+
+ lea r3, [pw_ang_table + 18 * 16]
+ mova m4, [r3 + 3 * 16] ; [21]
+ mova m5, [r3 - 8 * 16] ; [10]
+ mova m6, [r3 + 13 * 16] ; [31]
+ mova m7, [r3 + 2 * 16] ; [20]
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
+
+cglobal intra_pred_ang4_5, 3,5,8
+ xor r4, r4
+ inc r4
+ cmp r3m, byte 31
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ mova m1, m0
+ psrldq m1, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
+ mova m3, m0
+ psrldq m3, 4 ; [x x x x x x x x 7 6 6 5 5 4 4 3]
+ punpcklqdq m0, m1
+ punpcklqdq m2, m1, m3
+
+ lea r3, [pw_ang_table + 10 * 16]
+ mova m4, [r3 + 7 * 16] ; [17]
+ mova m5, [r3 - 8 * 16] ; [ 2]
+ mova m6, [r3 + 9 * 16] ; [19]
+ mova m7, [r3 - 6 * 16] ; [ 4]
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
+
+cglobal intra_pred_ang4_6, 3,5,8
+ xor r4, r4
+ inc r4
+ cmp r3m, byte 30
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ mova m2, m0
+ psrldq m2, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
+ punpcklqdq m0, m0
+ punpcklqdq m2, m2
+
+ lea r3, [pw_ang_table + 19 * 16]
+ mova m4, [r3 - 6 * 16] ; [13]
+ mova m5, [r3 + 7 * 16] ; [26]
+ mova m6, [r3 - 12 * 16] ; [ 7]
+ mova m7, [r3 + 1 * 16] ; [20]
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
+
+cglobal intra_pred_ang4_7, 3,5,8
+ xor r4, r4
+ inc r4
+ cmp r3m, byte 29
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ mova m3, m0
+ psrldq m3, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
+ punpcklqdq m2, m0, m3
+ punpcklqdq m0, m0
+
+ lea r3, [pw_ang_table + 20 * 16]
+ mova m4, [r3 - 11 * 16] ; [ 9]
+ mova m5, [r3 - 2 * 16] ; [18]
+ mova m6, [r3 + 7 * 16] ; [27]
+ mova m7, [r3 - 16 * 16] ; [ 4]
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
+
+cglobal intra_pred_ang4_8, 3,5,8
+ xor r4, r4
+ inc r4
+ cmp r3m, byte 28
+ mov r3, 9
+ cmove r3, r4
+
+ movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
+ punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
+ punpcklqdq m0, m0
More information about the x265-commits
mailing list