<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div>I have review his patch, after push they patch, I may send a new version</div>
<div></div>
<div id="divNeteaseMailCard"></div>
<div><br></div>At 2014-09-04 12:47:24,"Deepthi Nandakumar" <deepthi@multicorewareinc.com> wrote:<br>
<blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<div dir="ltr">
<div>
<div>
<div>Min,<br><br></div>Praveen has sent a number of patches on changing the entire interface for quant such that the coefficients are now 16-bit instead of 32-bit. Your patches still assume they are 32-bit? <br><br></div>Can you review all his patches (8-10 patches) and see if we're moving in the right direction?<br><br></div>Thanks,<br>Deepthi<br>
<div>
<div><br></div></div></div>
<div class="gmail_extra"><br><br>
<div class="gmail_quote">On Thu, Sep 4, 2014 at 5:07 AM, Min Chen <span dir="ltr"><<a href="mailto:chenm003@163.com" target="_blank">chenm003@163.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid"># HG changeset patch<br># User Min Chen <<a href="mailto:chenm003@163.com">chenm003@163.com</a>><br># Date 1409787419 25200<br># Node ID 4ca9e972f48cb4530ca7181ad7cec351568a99b3<br># Parent 94bd00d1af5d8c5f6f26f97c50a727588a860714<br>asm: optimize nquant by PSIGND, improve 13k cycles -> 11k cycles<br><br>diff -r 94bd00d1af5d -r 4ca9e972f48c source/common/dct.cpp<br>--- a/source/common/dct.cpp Wed Sep 03 16:36:44 2014 -0700<br>+++ b/source/common/dct.cpp Wed Sep 03 16:36:59 2014 -0700<br>@@ -801,6 +801,10 @@<br> {<br> uint32_t numSig = 0;<br><br>+ X265_CHECK((numCoeff % 16) == 0, "number of quant coeff is not multiple of 4x4\n");<br>+ X265_CHECK((uint32_t)add < ((uint32_t)1 << qBits), "2 ^ qBits less than add\n");<br>+ X265_CHECK(((intptr_t)quantCoeff & 15) == 0, "quantCoeff buffer not aligned\n");<br>+<br> for (int blockpos = 0; blockpos < numCoeff; blockpos++)<br> {<br> int level = coef[blockpos];<br>diff -r 94bd00d1af5d -r 4ca9e972f48c source/common/x86/pixel-util8.asm<br>--- a/source/common/x86/pixel-util8.asm Wed Sep 03 16:36:44 2014 -0700<br>+++ b/source/common/x86/pixel-util8.asm Wed Sep 03 16:36:59 2014 -0700<br>@@ -941,55 +941,47 @@<br> ; uint32_t nquant(int32_t *coef, int32_t *quantCoeff, int32_t *qCoef, int qBits, int add, int numCoeff);<br> ;-----------------------------------------------------------------------------<br> INIT_XMM sse4<br>-cglobal nquant, 4,5,8<br>+cglobal nquant, 3,5,8<br> movd m6, r4m<br> mov r4d, r5m<br> pxor m7, m7 ; m7 = numZero<br>- movd m5, r3d ; m5 = qbits<br>+ movd m5, r3m ; m5 = qbits<br> pshufd m6, m6, 0 ; m6 = add<br> mov r3d, r4d ; r3 = numCoeff<br> shr r4d, 3<br>+<br> .loop:<br> movu m0, [r0] ; m0 = level<br> movu m1, [r0 + 16] ; m1 = level<br>- movu m2, [r1] ; m2 = qcoeff<br>- movu m3, [r1 + 16] ; m3 = qcoeff<br>+<br>+ pabsd m2, m0<br>+ pmulld m2, [r1] ; m4 = tmpLevel1<br>+ paddd m2, m6<br>+ psrad m2, m5 ; m4 = level1<br>+ psignd m2, m0 ; restore sign<br>+<br>+ pabsd m3, m1<br>+ pmulld m3, [r1 + 16] ; m4 = tmpLevel1<br>+ paddd m3, m6<br>+ psrad m3, m5 ; m4 = level1<br>+ psignd m3, m1 ; restore sign<br> add r0, 32<br> add r1, 32<br><br>- pxor m4, m4<br>- pcmpgtd m4, m0 ; m4 = sign<br>- pabsd m0, m0<br>- pmulld m0, m2 ; m0 = tmpLevel1<br>- paddd m0, m6<br>- psrad m0, m5 ; m0 = level1<br>- pxor m0, m4<br>- psubd m0, m4<br>-<br>- pxor m4, m4<br>- pcmpgtd m4, m1 ; m4 = sign<br>- pabsd m1, m1<br>- pmulld m1, m3 ; m1 = tmpLevel1<br>- paddd m1, m6<br>- psrad m1, m5 ; m1 = level1<br>- pxor m1, m4<br>- psubd m1, m4<br>-<br>- packssdw m0, m0<br>- packssdw m1, m1<br>- pmovsxwd m0, m0<br>+ packssdw m2, m3<br>+ pmovsxwd m0, m2<br>+ movhlps m1, m2<br> pmovsxwd m1, m1<br><br>- movu [r2], m0<br>+ movu [r2 ], m0<br> movu [r2 + 16], m1<br> add r2, 32<br>+<br>+ pxor m4, m4<br>+ pcmpeqw m2, m4<br>+ psubw m7, m2<br>+<br> dec r4d<br>-<br>- packssdw m0, m1<br>- pxor m4, m4<br>- pcmpeqw m0, m4<br>- psubw m7, m0<br>-<br> jnz .loop<br><br> packuswb m7, m7<br>@@ -997,10 +989,8 @@<br> mov eax, r3d<br> movd r4d, m7<br> sub eax, r4d ; numSig<br>-<br> RET<br><br>-<br> ;-----------------------------------------------------------------------------<br> ; void dequant_normal(const int32_t* quantCoef, int32_t* coef, int num, int scale, int shift)<br> ;-----------------------------------------------------------------------------<br><br>_______________________________________________<br>x265-devel mailing list<br><a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br><a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br></blockquote></div><br></div></blockquote></div>