[x265] [PATCH] count_nonzero primitive optimization, downscaling quantCoef from int32_t* to int16_t*

Wed Sep 3 17:25:27 CEST 2014

 

At 2014-09-02 22:08:04,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1408951177 -19800
># Node ID 380a796052afc62cac7e480fde70e3766a940246
># Parent  c5624effb73c74e63fd2e42d2a48ea4490074dce
>count_nonzero primitive optimization, downscaling quantCoef from int32_t* to int16_t*
>
>diff -r c5624effb73c -r 380a796052af source/common/x86/pixel-util8.asm
>--- a/source/common/x86/pixel-util8.asm	Mon Sep 01 14:13:37 2014 +0530
>+++ b/source/common/x86/pixel-util8.asm	Mon Aug 25 12:49:37 2014 +0530
>@@ -1051,10 +1051,10 @@
> 
> 
> ;-----------------------------------------------------------------------------
>-; int count_nonzero(const int32_t *quantCoeff, int numCoeff);
>+; int count_nonzero(const int16_t *quantCoeff, int numCoeff);
> ;-----------------------------------------------------------------------------
> INIT_XMM ssse3
>-cglobal count_nonzero, 2,2,5
>+cglobal count_nonzero, 2,2,4
>     pxor        m0, m0
>     shr         r1d, 4
>     movd        m1, r1d
>@@ -1063,12 +1063,8 @@
> .loop:
>     mova        m2, [r0 +  0]
>     mova        m3, [r0 + 16]
>-    packssdw    m2, m3
>-    mova        m3, [r0 + 32]
>-    mova        m4, [r0 + 48]
>-    add         r0, 64
>-    packssdw    m3, m4
>     packsswb    m2, m3

it is aligned address, 'packuswb m2,[r0+16]' can reduce code size
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140903/1e08fdf6/attachment.html>