[x265] [PATCH] arm: Implement planecopy_cp NEON

chen chenm003 at 163.com
Thu Mar 3 08:28:06 CET 2016



At 2016-03-03 15:02:55,ramya at multicorewareinc.com wrote:
># HG changeset patch
># User Ramya Sriraman<ramya at multicorewareinc.com>
># Date 1456985538 -19800
>#      Thu Mar 03 11:42:18 2016 +0530
># Node ID dbccf88be30776f12c7f8c52b9da67d4607abcab
># Parent  9cc9920bf82be1b43efd2a3628e28a3a78ab3b2f
>arm: Implement planecopy_cp NEON
>
>diff -r 9cc9920bf82b -r dbccf88be307 source/common/arm/asm-primitives.cpp
>--- a/source/common/arm/asm-primitives.cpp	Wed Mar 02 17:26:11 2016 +0530
>+++ b/source/common/arm/asm-primitives.cpp	Thu Mar 03 11:42:18 2016 +0530
>@@ -142,6 +142,9 @@
>         p.pu[LUMA_64x48].copy_pp = PFX(blockcopy_pp_64x48_neon);
>         p.pu[LUMA_64x64].copy_pp = PFX(blockcopy_pp_64x64_neon);
> 
>+        // planecopy
>+        p.planecopy_cp = PFX(pixel_planecopy_cp_neon);
>+
>         // sad
>         p.pu[LUMA_8x4].sad    = PFX(pixel_sad_8x4_neon);
>         p.pu[LUMA_8x8].sad    = PFX(pixel_sad_8x8_neon);
>diff -r 9cc9920bf82b -r dbccf88be307 source/common/arm/pixel-util.S
>--- a/source/common/arm/pixel-util.S	Wed Mar 02 17:26:11 2016 +0530
>+++ b/source/common/arm/pixel-util.S	Thu Mar 03 11:42:18 2016 +0530
>@@ -626,3 +626,56 @@
>     pop             {r4, r5}
>     bx              lr
> endfunc
>+
>+function x265_pixel_planecopy_cp_neon
>+    push            {r4, r5, r6}
>+    ldr             r4, [sp, #12]
#12 -> #4*3, it means we reserved 3 of Dword in stack 

>+    ldr             r5, [sp, #16]
#16 -> #4*3 + 4

>+    ldr             r12, [sp, #20]
>+    vdup.8          q2, r12
>+    sub             r5, #1
>+
>+.loop_h:
>+    eor             r6, r6
>+    eor             r12, r12
>+.loop_w:
>+    add             r12, r0, r6
Why r0 is constant in loop_w?

>+    vld1.u8         {q0}, [r12]
>+    vshl.u8         q0, q0, q2
>+    add             r12, r2, r6
Maybe combo with vld1

>+    vst1.u8         {q0}, [r12]
>+
>+    add             r6, #16
>+    cmp             r6, r4
>+    blt             .loop_w
>+

Final, this function just call in conditional (X265_DEPTH > 8), so unnecessary 8bits to 8bits copy function 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20160303/a82b8bbf/attachment.html>


More information about the x265-devel mailing list