[x265] [PATCH] refactorizaton of the transform/quant path
dave
dtyx265 at gmail.com
Thu Nov 20 04:12:59 CET 2014
On 11/18/2014 09:48 PM, Steve Borho wrote:
> On 11/18, dave wrote:
>> I have been working on an sse2 idct8 assembler primitive. Currently it's
>> only performs a little better than the intrinsic. It is based on the gcc
>> assembler output of the intrinsic.
>>
>> FYI, at first I simply converted the ssse3 idct8 assembler primitive to sse2
>> since is only uses 3 ssse3 instructions in 5 places and no sse3 instructions
>> but that performed poorly compared to the intrinsic version. While similar,
>> It seems like the intrinsic algorithm performs better than the multiple
>> function and macro setup of the idct8 ssse3 primitive.
>>
>> I can submit what I have but there is probably more room for improvement.
> If you send what you have, and it is even or faster than the intrisic
> version then I'll gladly take it. Min can review it and give you hints
> on how to optimize it further.
>
OK. I also created another version that is directly based on the idct8
intrinsic. As I worked on improving it I realized I was slowly turning
it into something like GCC's optimized output so I turned my focus back
to that. I spent too much time on it.
More information about the x265-devel
mailing list