<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Most are fine, I put some comment in below<br><div></div><div id="divNeteaseMailCard"></div><br>At 2016-04-25 20:03:59,"Ramya Sriraman" <ramya@multicorewareinc.com> wrote:<br> <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid"><div dir="ltr">Ignore above patch. Modified one below.<br><br># HG changeset patch<br># User Ramya Sriraman<<a href="mailto:ramya@multicorewareinc.com">ramya@multicorewareinc.com</a>><br># Date 1461158053 -19800<br>#      Wed Apr 20 18:44:13 2016 +0530<br># Node ID c26f9a4dc9173b0cbfb609a984c57607d129f011<br># Parent  4f83d465d11b3baa46e6089f73b0929266d4b722<br>arm: Implement quant<br><br>diff -r 4f83d465d11b -r c26f9a4dc917 source/common/arm/asm-primitives.cpp<br>--- a/source/common/arm/asm-primitives.cpp    Wed Mar 30 17:29:13 2016 +0530<br>+++ b/source/common/arm/asm-primitives.cpp    Wed Apr 20 18:44:13 2016 +0530<br>@@ -820,6 +820,8 @@<br>         p.chroma[X265_CSP_I444].pu[LUMA_24x32].filter_vsp = PFX(interp_4tap_vert_sp_24x32_neon);<br>         p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_vsp = PFX(interp_4tap_vert_sp_48x64_neon);<br> <br>+        // quant<br>+        p.quant = PFX(quant_neon);<br>     }<br>     if (cpuMask & X265_CPU_ARMV6)<br>     {<br>diff -r 4f83d465d11b -r c26f9a4dc917 source/common/arm/pixel-util.S<br>--- a/source/common/arm/pixel-util.S    Wed Mar 30 17:29:13 2016 +0530<br>+++ b/source/common/arm/pixel-util.S    Wed Apr 20 18:44:13 2016 +0530<br>@@ -1962,3 +1962,60 @@<br>     bx              lr<br> endfunc<br> <br>+function x265_quant_neon<br>+    push            {r4-r6}<br>+    ldr             r4, [sp, #3* 4]         //qbits<br>+    vdup.s32        q8, r4<br>+    mov             r12, #8<br>+    sub             r12, r12, r4<br>+    vdup.s32        q10, r12                // -(qbits- 8) = 8- qbits<br>+    ldr             r4, [sp, #3* 4 + 4]     // add<br>+    vdup.s32        q9, r4<br>+    ldr             r4, [sp, #3* 4 + 8]     // numcoeff<br>+<br>+    lsr             r4, r4 ,#2<br>+    eor             r5, r5<br>+    eor             r6, r6<br>+<br>+.loop_quant:<br>+<br>+    vld1.s16        d0, [r0]!<br>+    vmovl.s16       q1, d0                  // coef[blockpos]<br>+<br>+    vclt.s32        q4, q1, #0<br>+<br>+    vabs.s32        q1, q1                  // q1=level=abs(coef[blockpos])<br>+    vld1.s32        {q0}, [r1]!             // quantCoeff[blockpos]<br>+    vmul.i32        q0, q0, q1              // q0=tmplevel = abs(level) * quantCoeff[blockpos];<br>+<br>+    vadd.s32        q1, q0, q9              // q1= tmplevel+add<br>+    vneg.s32        q12, q8<br>+    vshl.s32        q1, q1, q12             // q1= level =(tmplevel+add) >> qbits<br>+<br>+    vshl.s32        q3, q1, q8              // q3 = level << qBits<br>+    vsub.s32        q13, q0, q3             // q8= tmplevel - (level << qBits)<br>VSHL+VSUB may replace by VMLS<br><br>+    vshl.s32        q13, q13, q10           // q3= ((tmplevel - (level << qBits)) >> qBits8)<br>+    vst1.s32        {q13}, [r2]!            // store deltaU<br>+<br>+    // numsig<br>+    vclz.s32        q2, q1<br>VCLZ (Vector Count Leading Zeros) counts the number of consecutive zeros</div><div dir="ltr">Are you means VCLT?<br><br>+    vshr.u32        q2, #5<br>+    vadd.u32        d4, d5<br>+    vpadd.u32       d4, d4<br>+    vmov.32         r12, d4[0]<br>+    add             r5, r12<br>+    add             r6, #4<br>above code block may replace by  'v<span style="line-height: 1.7;">add.s32 q2, q1' and sum after loop</span></div><div dir="ltr"><br>+<br>+    veor.s32        q2, q1, q4<br>+    vsub.s32        q2, q2, q4<br>+    vqmovn.s32      d0, q2<br>+    vst1.s16        d0, [r3]!<br>+<br>+    subs            r4, #1<br>+    bne             .loop_quant<br>+<br>+    sub             r0, r6, r5<br>+    pop             {r4-r6}<br>+    bx              lr<br>+endfunc<br>+<br>diff -r 4f83d465d11b -r c26f9a4dc917 source/common/arm/pixel-util.h<br>--- a/source/common/arm/pixel-util.h    Wed Mar 30 17:29:13 2016 +0530<br>+++ b/source/common/arm/pixel-util.h    Wed Apr 20 18:44:13 2016 +0530<br>@@ -78,4 +78,6 @@<br> int x265_pixel_sa8d_32x32_neon(const pixel* pix1, intptr_t i_pix1, const pixel* pix2, intptr_t i_pix2);<br> int x265_pixel_sa8d_32x64_neon(const pixel* pix1, intptr_t i_pix1, const pixel* pix2, intptr_t i_pix2);<br> int x265_pixel_sa8d_64x64_neon(const pixel* pix1, intptr_t i_pix1, const pixel* pix2, intptr_t i_pix2);<br>+<br>+uint32_t x265_quant_neon(const int16_t* coef, const int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff);<br> #endif // ifndef X265_PIXEL_UTIL_ARM_H<br><br></div><div class="gmail_extra"><br></div>
</blockquote></div>