[x265] [PATCH] blockfill_s_16x16 avx2 asm code, performance improved 389.21 cycles -> 204.38 cycles

chen chenm003 at 163.com
Mon Sep 29 17:50:54 CEST 2014


 

At 2014-09-29 16:47:45,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1411980445 -19800
># Node ID 9a8552ea378500baa21b89b24d8aec99acf7cce2
># Parent  32f50df7fa7672f4c1818ddf3165b4bd243e0b10
>blockfill_s_16x16 avx2 asm code, performance improved 389.21 cycles -> 204.38 cycles
>
>diff -r 32f50df7fa76 -r 9a8552ea3785 source/common/x86/blockcopy8.asm
>--- a/source/common/x86/blockcopy8.asm	Fri Sep 26 17:33:09 2014 -0500
>+++ b/source/common/x86/blockcopy8.asm	Mon Sep 29 14:17:25 2014 +0530
>@@ -1826,6 +1826,38 @@
> 
> BLOCKFILL_S_W16_H8 16, 16
> 
>+INIT_YMM avx2
>+cglobal blockfill_s_16x16, 3, 4, 1
>+add        r1, r1
>+lea        r3, [3 * r1]
>+
>+movd       xm0, r2d
>+pshuflw    xm0, xm0, 0
>+pshufd     xm0, xm0, 0
>+
>+vinserti128 m0, m0, xm0, 1

vpbroadcastd
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140929/de4bd301/attachment.html>


More information about the x265-devel mailing list