[x265] [PATCH] count_nonzero primitive optimization, downscaling quantCoef from int32_t* to int16_t*
chen
chenm003 at 163.com
Wed Sep 3 17:25:27 CEST 2014
At 2014-09-02 22:08:04,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1408951177 -19800
># Node ID 380a796052afc62cac7e480fde70e3766a940246
># Parent c5624effb73c74e63fd2e42d2a48ea4490074dce
>count_nonzero primitive optimization, downscaling quantCoef from int32_t* to int16_t*
>
>diff -r c5624effb73c -r 380a796052af source/common/x86/pixel-util8.asm
>--- a/source/common/x86/pixel-util8.asm Mon Sep 01 14:13:37 2014 +0530
>+++ b/source/common/x86/pixel-util8.asm Mon Aug 25 12:49:37 2014 +0530
>@@ -1051,10 +1051,10 @@
>
>
> ;-----------------------------------------------------------------------------
>-; int count_nonzero(const int32_t *quantCoeff, int numCoeff);
>+; int count_nonzero(const int16_t *quantCoeff, int numCoeff);
> ;-----------------------------------------------------------------------------
> INIT_XMM ssse3
>-cglobal count_nonzero, 2,2,5
>+cglobal count_nonzero, 2,2,4
> pxor m0, m0
> shr r1d, 4
> movd m1, r1d
>@@ -1063,12 +1063,8 @@
> .loop:
> mova m2, [r0 + 0]
> mova m3, [r0 + 16]
>- packssdw m2, m3
>- mova m3, [r0 + 32]
>- mova m4, [r0 + 48]
>- add r0, 64
>- packssdw m3, m4
> packsswb m2, m3
it is aligned address, 'packuswb m2,[r0+16]' can reduce code size
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140903/1e08fdf6/attachment.html>
More information about the x265-devel
mailing list