[x265] [PATCH] refactorizaton of the transform/quant path

Steve Borho steve at borho.org
Wed Nov 19 06:48:00 CET 2014


On 11/18, dave wrote:
> I have been working on an sse2 idct8 assembler primitive.  Currently it's
> only performs a little better than the intrinsic.  It is based on the gcc
> assembler output of the intrinsic.
> 
> FYI, at first I simply converted the ssse3 idct8 assembler primitive to sse2
> since is only uses 3 ssse3 instructions in 5 places and no sse3 instructions
> but that performed poorly compared to the intrinsic version.  While similar,
> It seems like the intrinsic algorithm performs better than the multiple
> function and macro setup of the idct8 ssse3 primitive.
> 
> I can submit what I have but there is probably more room for improvement.

If you send what you have, and it is even or faster than the intrisic
version then I'll gladly take it.  Min can review it and give you hints
on how to optimize it further.

-- 
Steve Borho


More information about the x265-devel mailing list