[x265] [PATCH] refactorizaton of the transform/quant path

dave dtyx265 at gmail.com
Thu Nov 20 04:12:59 CET 2014


On 11/18/2014 09:48 PM, Steve Borho wrote:
> On 11/18, dave wrote:
>> I have been working on an sse2 idct8 assembler primitive.  Currently it's
>> only performs a little better than the intrinsic.  It is based on the gcc
>> assembler output of the intrinsic.
>>
>> FYI, at first I simply converted the ssse3 idct8 assembler primitive to sse2
>> since is only uses 3 ssse3 instructions in 5 places and no sse3 instructions
>> but that performed poorly compared to the intrinsic version.  While similar,
>> It seems like the intrinsic algorithm performs better than the multiple
>> function and macro setup of the idct8 ssse3 primitive.
>>
>> I can submit what I have but there is probably more room for improvement.
> If you send what you have, and it is even or faster than the intrisic
> version then I'll gladly take it.  Min can review it and give you hints
> on how to optimize it further.
>
OK.  I also created another version that is directly based on the idct8 
intrinsic.  As I worked on improving it I realized I was slowly turning 
it into something like GCC's optimized output so I turned my focus back 
to that.  I spent too much time on it.


More information about the x265-devel mailing list