<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>At 2014-02-21 10:13:56,"Satoshi Nakagawa" <nakagawa424@oki.com> wrote:<BR>># HG changeset patch<BR>># User Satoshi Nakagawa <nakagawa424@oki.com><BR>># Date 1392948676 -32400<BR>># Fri Feb 21 11:11:16 2014 +0900<BR>># Node ID 66d8cb6573f27b29a9dc92ec480c635f0de48c03<BR>># Parent 894bde574bc1678471e0c23ceb381a806768ea95<BR>>asm: update count_nonzero, add testbench<BR>><BR>>diff -r 894bde574bc1 -r 66d8cb6573f2 source/common/x86/pixel-util8.asm<BR>>--- a/source/common/x86/pixel-util8.asm Thu Feb 20 17:18:42 2014 -0600<BR>>+++ b/source/common/x86/pixel-util8.asm Fri Feb 21 11:11:16 2014 +0900<BR>>@@ -1240,11 +1240,12 @@<BR>> ; int count_nonzero(const int32_t *quantCoeff, int numCoeff);<BR>> ;-----------------------------------------------------------------------------<BR>> INIT_XMM sse2<BR>>-cglobal count_nonzero, 2,3,4<BR>>+cglobal count_nonzero, 2,2,4<BR>> pxor m0, m0<BR>>- pxor m1, m1<BR>>- mov r2d, r1d<BR>> shr r1d, 3<BR>>+ movd m1, r1d<BR>>+ pshufd m1, m1, 0<BR>>+ packssdw m1, m1</DIV>
<DIV>packssdw is expendsive instruction, pshuflw+punpcklqdq is better.<BR>> <BR>> .loop<BR>> mova m2, [r0]</DIV>
<DIV>>@@ -1252,16 +1253,13 @@<BR>> add r0, 32<BR>> packssdw m2, m3<BR>> pcmpeqw m2, m0<BR>>- psrlw m2, 15<BR>>- packsswb m2, m2<BR>>- psadbw m2, m0<BR>>- paddd m1, m2<BR>>+ paddw m1, m2<BR>> dec r1d<BR>>- jnz .loop<BR>>-<BR>>- movd r1d, m1<BR>>- sub r2d, r1d<BR>>- mov eax, r2d<BR>>+ jnz .loop<BR>>+<BR>>+ packuswb m1, m1<BR>>+ psadbw m1, m0<BR>>+ movd eax, m1<BR>> <BR>> RET<BR></DIV></div>