[x265] [PATCH] dct: Replaced partialButterfly16 vector class function to intrinsic

Steve Borho steve at borho.org
Sat Oct 12 05:47:11 CEST 2013


On Fri, Oct 11, 2013 at 10:39 PM, chen <chenm003 at 163.com> wrote:

>
> 在 2013-10-12 03:12:46,"Steve Borho" <steve at borho.org> 写道:
>
>
>
>
> On Fri, Oct 11, 2013 at 3:40 AM, <yuvaraj at multicorewareinc.com> wrote:
>
>> # HG changeset patch
>> # User Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
>> # Date 1381480768 -19800
>> #      Fri Oct 11 14:09:28 2013 +0530
>> # Node ID 46b954edb1c52a557b9d94c4ed380ea0578c1949
>> # Parent  8bb743458331d7cdc1008e217542e406818c5a7a
>> dct: Replaced partialButterfly16 vector class function to intrinsic
>>
>
> For some reason, this new version is 3x slower than the vector version; we
> need to figure out why.  It looks like the code-flow is the same.
>
> are you use VS compiler? the instruction _mm_setr_epi32 is very slow on
> it, most time vector version make constant array.
>
>
Yes, indeed.  What should they use instead of _mm_setr_epi32?

-- 
Steve Borho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131011/664a1418/attachment.html>


More information about the x265-devel mailing list