[x265] [PATCH 3 of 7] asm: intra_pred_ang4_5_sse2 improved ~2.5% 642.50 -> 627.50 with nits and tweaks
dave
dtyx265 at gmail.com
Wed Apr 1 23:16:03 CEST 2015
please disregard this one. The correct version has been sent
On 04/01/2015 11:52 AM, dtyx265 at gmail.com wrote:
> # HG changeset patch
> # User David T Yuen <dtyx265 at gmail.com>
> # Date 1427912500 25200
> # Node ID fc6b5f8bbcc8283e5b4fd88d41b8c313b002a198
> # Parent fc902b84fc7f8dadf56431766adab8eda3520596
> asm: intra_pred_ang4_5_sse2 improved ~2.5% 642.50 -> 627.50 with nits and tweaks
>
> Changed r3 and r4 to r3d and r4d
> tweaked unpacking for performance
>
> diff -r fc902b84fc7f -r fc6b5f8bbcc8 source/common/x86/intrapred8.asm
> --- a/source/common/x86/intrapred8.asm Wed Apr 01 06:01:22 2015 -0700
> +++ b/source/common/x86/intrapred8.asm Wed Apr 01 11:21:40 2015 -0700
> @@ -1436,22 +1436,21 @@
> jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
>
> cglobal intra_pred_ang4_5, 3,5,8
> - xor r4, r4
> - inc r4
> - cmp r3m, byte 31
> - mov r3, 9
> - cmove r3, r4
> + xor r4d, r4d
> + inc r4d
> + cmp r3d, byte 31
> + mov r3d, 9
> + cmove r3d, r4d
>
> movh m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
> - mova m1, m0
> - psrldq m1, 1 ; [x 8 7 6 5 4 3 2]
> - punpcklbw m0, m1 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
> - mova m1, m0
> - psrldq m1, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
> + punpcklbw m0, m0 ; [x 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1]
> + psrldq m0, 1
> + mova m2, m0
> + psrldq m2, 2 ; [x x x x x x x x 6 5 5 4 4 3 3 2]
> mova m3, m0
> psrldq m3, 4 ; [x x x x x x x x 7 6 6 5 5 4 4 3]
> - punpcklqdq m0, m1
> - punpcklqdq m2, m1, m3
> + punpcklqdq m0, m2
> + punpcklqdq m2, m3
>
> lea r3, [pw_ang_table + 10 * 16]
> mova m4, [r3 + 7 * 16] ; [17]
More information about the x265-devel
mailing list