[x265] [PATCH 1 of 2] improve count_nonzero by SSSE3
Derek Buitenhuis
derek.buitenhuis at gmail.com
Fri Jun 27 19:44:15 CEST 2014
On 6/27/2014 6:08 PM, chen wrote:
> I use ssse3 instruction PSHUFB to replace 3 SSE2 instructions, the x86inc macro can't handle it.
>
> After patch, this function is faster ~20% and codeCoeffNxN ~7% speedup, so I don't worry about old CPU's performance.
I guess SSSE3 is very prevalent nowadays -- though I am still not a fan
of throwing away variants, I guess it's reasonable in this case.
- Derek
More information about the x265-devel
mailing list