[x265] [PATCH 1 of 2] improve count_nonzero by SSSE3
chen
chenm003 at 163.com
Fri Jun 27 19:08:24 CEST 2014
At 2014-06-28 01:02:27,"Derek Buitenhuis" <derek.buitenhuis at gmail.com> wrote:
>On 6/27/2014 4:05 PM, chen wrote:
>> I can't understand what's your means. could you tell me more?
>>
>> I use some SSSE3 instruction and process 16 pixels every loop.
>
>I meant keep both sse2 and ssse3 variants. Not sure if x86inc.asm macros
>help with this or not.
>
I use ssse3 instruction PSHUFB to replace 3 SSE2 instructions, the x86inc macro can't handle it.
After patch, this function is faster ~20% and codeCoeffNxN ~7% speedup, so I don't worry about old CPU's performance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140628/2accdffa/attachment.html>
More information about the x265-devel
mailing list