[x265] [PATCH 1 of 2] improve count_nonzero by SSSE3

Derek Buitenhuis derek.buitenhuis at gmail.com
Fri Jun 27 19:44:15 CEST 2014


On 6/27/2014 6:08 PM, chen wrote:
> I use ssse3 instruction PSHUFB to replace 3 SSE2 instructions, the x86inc macro can't handle it.
> 
> After patch, this function is faster ~20% and codeCoeffNxN ~7% speedup, so I don't worry about old CPU's performance.

I guess SSSE3 is very prevalent nowadays -- though I am still not a fan
of throwing away variants, I guess it's reasonable in this case.

- Derek


More information about the x265-devel mailing list