<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div> </div><pre><br>At 2014-09-02 22:08:04,praveen@multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1408951177 -19800
># Node ID 380a796052afc62cac7e480fde70e3766a940246
># Parent c5624effb73c74e63fd2e42d2a48ea4490074dce
>count_nonzero primitive optimization, downscaling quantCoef from int32_t* to int16_t*
>
>diff -r c5624effb73c -r 380a796052af source/common/x86/pixel-util8.asm
>--- a/source/common/x86/pixel-util8.asm Mon Sep 01 14:13:37 2014 +0530
>+++ b/source/common/x86/pixel-util8.asm Mon Aug 25 12:49:37 2014 +0530
>@@ -1051,10 +1051,10 @@
>
>
> ;-----------------------------------------------------------------------------
>-; int count_nonzero(const int32_t *quantCoeff, int numCoeff);
>+; int count_nonzero(const int16_t *quantCoeff, int numCoeff);
> ;-----------------------------------------------------------------------------
> INIT_XMM ssse3
>-cglobal count_nonzero, 2,2,5
>+cglobal count_nonzero, 2,2,4
> pxor m0, m0
> shr r1d, 4
> movd m1, r1d
>@@ -1063,12 +1063,8 @@
> .loop:
> mova m2, [r0 + 0]
> mova m3, [r0 + 16]
>- packssdw m2, m3
>- mova m3, [r0 + 32]
>- mova m4, [r0 + 48]
>- add r0, 64
>- packssdw m3, m4
> packsswb m2, m3
</pre><pre>it is aligned address, 'packuswb m2,[r0+16]' can reduce code size</pre><pre> </pre></div>