[x265] Fwd: Fwd: [PATCH] copy_cnt_4: faster AVX2 code

Praveen Tiwari praveen at multicorewareinc.com
Wed Sep 10 11:49:21 CEST 2014


---------- Forwarded message ----------
From: chen <chenm003 at 163.com>
Date: Wed, Sep 10, 2014 at 12:14 PM
Subject: Re: [x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>




At 2014-09-10 09:34:31,"Praveen Tiwari" <praveen at multicorewareinc.com>
wrote:


---------- Forwarded message ----------
From: chen <chenm003 at 163.com>
Date: Tue, Sep 9, 2014 at 10:17 AM
Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>


 >>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?
what about vinserti128 ?

>>you want to use vinserti128 combin 128bits to 256 bits, is it more cost
than two of movu

I tested both sse and avx2 code on HASWELL-I5 machine,  avx2 code seems a
bit faster so, I think we should keep both versions. Here is result of 3
runs:

*SSE VERSION:-*
         copy_cnt[4x4]  4.21x    110.16          463.86
        copy_cnt[4x4]  4.18x    104.64          437.08
        copy_cnt[4x4]  4.17x    110.23          460.02

*AVX2 VERSION:-*
        copy_cnt[4x4]  4.71x    99.23           467.63
        copy_cnt[4x4]  4.39x    104.46          458.58
        copy_cnt[4x4]  4.71x    99.27           467.91


_______________________________________________
x265-devel mailing list
x265-devel at videolan.org
https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140910/4d151dc3/attachment.html>


More information about the x265-devel mailing list