<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">this version looks good, thanks<br><div></div><div id="divNeteaseMailCard"></div><br>At 2016-03-04 17:29:40,"Ramya Sriraman" <ramya@multicorewareinc.com> wrote:<br> <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid"><div dir="ltr">Thanks for the improvements min. Pls find the modified patch below.<br><br># HG changeset patch<br># User Ramya Sriraman<<a href="mailto:ramya@multicorewareinc.com">ramya@multicorewareinc.com</a>><br># Date 1456985538 -19800<br>#      Thu Mar 03 11:42:18 2016 +0530<br># Node ID 75a3948f28b6bd8f2b3536cf18e17cc8573be444<br># Parent  9cc9920bf82be1b43efd2a3628e28a3a78ab3b2f<br>arm: Implement planecopy_cp NEON<br><br>diff -r 9cc9920bf82b -r 75a3948f28b6 source/common/arm/asm-primitives.cpp<br>--- a/source/common/arm/asm-primitives.cpp    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/asm-primitives.cpp    Thu Mar 03 11:42:18 2016 +0530<br>@@ -33,6 +33,7 @@<br> #include "blockcopy8.h"<br> #include "pixel.h"<br> #include "pixel-util.h"<br>+#include "ipfilter8.h"<br> }<br> <br> namespace X265_NS {<br>@@ -142,6 +143,9 @@<br>         p.pu[LUMA_64x48].copy_pp = PFX(blockcopy_pp_64x48_neon);<br>         p.pu[LUMA_64x64].copy_pp = PFX(blockcopy_pp_64x64_neon);<br> <br>+        // planecopy<br>+        p.planecopy_cp = PFX(pixel_planecopy_cp_neon);<br>+<br>         // sad<br>         p.pu[LUMA_8x4].sad    = PFX(pixel_sad_8x4_neon);<br>         p.pu[LUMA_8x8].sad    = PFX(pixel_sad_8x8_neon);<br>diff -r 9cc9920bf82b -r 75a3948f28b6 source/common/arm/pixel-util.S<br>--- a/source/common/arm/pixel-util.S    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/pixel-util.S    Thu Mar 03 11:42:18 2016 +0530<br>@@ -626,3 +626,55 @@<br>     pop             {r4, r5}<br>     bx              lr<br> endfunc<br>+<br>+function x265_pixel_planecopy_cp_neon<br>+    push            {r4, r5, r6, r7}<br>+    ldr             r4, [sp, #4 * 4]<br>+    ldr             r5, [sp, #4 * 4 + 4]<br>+    ldr             r12, [sp, #4 * 4 + 8]<br>+    vdup.8          q2, r12<br>+    sub             r5, #1<br>+<br>+.loop_h:<br>+    mov             r6, r0<br>+    mov             r12, r2<br>+    eor             r7, r7<br>+.loop_w:<br>+    vld1.u8         {q0}, [r6]!<br>+    vshl.u8         q0, q0, q2<br>+    vst1.u8         {q0}, [r12]!<br>+<br>+    add             r7, #16<br>+    cmp             r7, r4<br>+    blt             .loop_w<br>+<br>+    add             r0, r1<br>+    add             r2, r3<br>+<br>+    subs             r5, #1<br>+    bgt             .loop_h<br>+<br>+// handle last row<br>+    mov             r5, r4<br>+    lsr             r5, #3<br>+<br>+.loopW8:<br>+    vld1.u8         d0, [r0]!<br>+    vshl.u8         d0, d0, d4<br>+    vst1.u8         d0, [r2]!<br>+    subs            r4, r4, #8<br>+    subs            r5, #1<br>+    bgt             .loopW8<br>+<br>+    mov             r5,#8<br>+    sub             r5, r4<br>+    sub             r0, r5<br>+    sub             r2, r5<br>+    vld1.u8         d0, [r0]<br>+    vshl.u8         d0, d0, d4<br>+    vst1.u8         d0, [r2]<br>+<br>+    pop             {r4, r5, r6, r7}<br>+    bx              lr<br>+endfunc<br>+<br>diff -r 9cc9920bf82b -r 75a3948f28b6 source/common/arm/pixel.h<br>--- a/source/common/arm/pixel.h    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/pixel.h    Thu Mar 03 11:42:18 2016 +0530<br>@@ -163,4 +163,6 @@<br> void x265_pixel_add_ps_16x16_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br> void x265_pixel_add_ps_32x32_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br> void x265_pixel_add_ps_64x64_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br>+<br>+void x265_pixel_planecopy_cp_neon(const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);<br> #endif // ifndef X265_I386_PIXEL_ARM_H<br><br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div><span style="color:rgb(56,118,29)"><br></span></div><div><span style="color:rgb(56,118,29)">Thank you<br></span></div><span style="color:rgb(56,118,29)">Regards<br></span></div><span style="color:rgb(56,118,29)">Ramya</span><br></div></div></div></div></div>
<br><div class="gmail_quote">On Fri, Mar 4, 2016 at 2:18 PM, Ramya Sriraman <span dir="ltr"><<a href="mailto:ramya@multicorewareinc.com" target="_blank">ramya@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span class=""># HG changeset patch<br># User Ramya Sriraman<<a href="mailto:ramya@multicorewareinc.com" target="_blank">ramya@multicorewareinc.com</a>><br># Date 1456985538 -19800<br>#      Thu Mar 03 11:42:18 2016 +0530<br></span># Node ID 299caedec2f38b9d9b658aace5c74ace36b6b324<span class=""><br># Parent  9cc9920bf82be1b43efd2a3628e28a3a78ab3b2f<br>arm: Implement planecopy_cp NEON<br><br></span>diff -r 9cc9920bf82b -r 299caedec2f3 source/common/arm/asm-primitives.cpp<span class=""><br>--- a/source/common/arm/asm-primitives.cpp    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/asm-primitives.cpp    Thu Mar 03 11:42:18 2016 +0530<br></span>@@ -33,6 +33,7 @@<br> #include "blockcopy8.h"<br> #include "pixel.h"<br> #include "pixel-util.h"<br>+#include "ipfilter8.h"<br> }<br> <br> namespace X265_NS {<br>@@ -142,6 +143,9 @@<span class=""><br>         p.pu[LUMA_64x48].copy_pp = PFX(blockcopy_pp_64x48_neon);<br>         p.pu[LUMA_64x64].copy_pp = PFX(blockcopy_pp_64x64_neon);<br> <br>+        // planecopy<br>+        p.planecopy_cp = PFX(pixel_planecopy_cp_neon);<br>+<br>         // sad<br>         p.pu[LUMA_8x4].sad    = PFX(pixel_sad_8x4_neon);<br>         p.pu[LUMA_8x8].sad    = PFX(pixel_sad_8x8_neon);<br></span>diff -r 9cc9920bf82b -r 299caedec2f3 source/common/arm/pixel-util.S<span class=""><br>--- a/source/common/arm/pixel-util.S    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/pixel-util.S    Thu Mar 03 11:42:18 2016 +0530<br></span>@@ -626,3 +626,57 @@<span class=""><br>     pop             {r4, r5}<br>     bx              lr<br> endfunc<br>+<br>+function x265_pixel_planecopy_cp_neon<br></span>+    push            {r4, r5, r6, r7}<br>+    ldr             r4, [sp, #4 * 4]<br>+    ldr             r5, [sp, #4 * 4 + 4]<br>+    ldr             r12, [sp, #4 * 4 + 8]<span class=""><br>+    vdup.8          q2, r12<br>+    sub             r5, #1<br>+<br>+.loop_h:<br></span>+    mov             r6, r0<br>+    mov             r12, r2<br>+    eor             r7, r7<br>+.loop_w:<br>+    vld1.u8         {q0}, [r6]<span class=""><br>+    vshl.u8         q0, q0, q2<br></span><span class="">+    vst1.u8         {q0}, [r12]<br>+<br></span>+    add             r12, #16<br>+    add             r6, #16<br>+    add             r7, #16<br>+    cmp             r7, r4<br>+    blt             .loop_w<br>+<div><div class="h5"><br>+    add             r0, r1<br>+    add             r2, r3<br>+<br>+    subs             r5, #1<br>+    bgt             .loop_h<br>+<br>+// handle last row<br>+    mov             r5, r4<br>+    lsr             r5, #3<br>+<br>+.loopW8:<br>+    vld1.u8         d0, [r0]!<br>+    vshl.u8         d0, d0, d4<br>+    vst1.u8         d0, [r2]!<br>+    subs            r4, r4, #8<br>+    subs            r5, #1<br>+    bgt             .loopW8<br>+<br>+    mov             r5,#8<br>+    sub             r5, r4<br>+    sub             r0, r5<br>+    sub             r2, r5<br>+    vld1.u8         d0, [r0]<br>+    vshl.u8         d0, d0, d4<br>+    vst1.u8         d0, [r2]<br>+<br></div></div>+    pop             {r4, r5, r6, r7}<span class=""><br>+    bx              lr<br>+endfunc<br>+<br></span>diff -r 9cc9920bf82b -r 299caedec2f3 source/common/arm/pixel.h<br>--- a/source/common/arm/pixel.h    Wed Mar 02 17:26:11 2016 +0530<br>+++ b/source/common/arm/pixel.h    Thu Mar 03 11:42:18 2016 +0530<span class=""><br>@@ -163,4 +163,6 @@<br> void x265_pixel_add_ps_16x16_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br> void x265_pixel_add_ps_32x32_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br> void x265_pixel_add_ps_64x64_neon(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);<br>+<br>+void x265_pixel_planecopy_cp_neon(const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);<br> #endif // ifndef X265_I386_PIXEL_ARM_H<br><br></span></div><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><div><div dir="ltr"><div><div><span style="color:rgb(56,118,29)"><br></span></div><div><span style="color:rgb(56,118,29)">Thank you<br></span></div><span style="color:rgb(56,118,29)">Regards<span class="HOEnZb"><font color="#888888"><br></font></span></span></div><span class="HOEnZb"><font color="#888888"><span style="color:rgb(56,118,29)">Ramya</span><br></font></span></div></div></div></div></div><div><div class="h5">
<br><div class="gmail_quote">On Fri, Mar 4, 2016 at 11:42 AM, Ramya Sriraman <span dir="ltr"><<a href="mailto:ramya@multicorewareinc.com" target="_blank">ramya@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><div dir="ltr"><div dir="ltr"><div><div><div>Hi min,<br></div>I made the #12 -> #4*3 correction. <br>R0
 is constant because if i keep adding number of bytes loaded by 
combining it with vld1.u8, then at the end of the loop when i add r1, it
 will be r0+number_of_bytes+r1 and not the intended r0+r1.<br></div>Also, this is basically an upShift primitive. So it mite be useful for 8bit build also. <br></div>I will mail the patch with modification to mailing list based on your response. <br></div><br clear="all"></div></span><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><div><div dir="ltr"><div><div><span style="color:rgb(56,118,29)"><br></span></div><div><span style="color:rgb(56,118,29)">Thank you<br></span></div><span style="color:rgb(56,118,29)">Regards<br></span></div><span style="color:rgb(56,118,29)">Ramya</span><br></div></div></div></div></div>
<br><div class="gmail_quote">On Fri, Mar 4, 2016 at 11:41 AM, Min Chen <span dir="ltr"><<a href="mailto:min.chen@multicorewareinc.com" target="_blank">min.chen@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br></div>
</blockquote></div><br></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</blockquote></div>