[x265] Fwd: Fwd: [PATCH] copy_cnt_4: faster AVX2 code

Wed Sep 10 23:22:38 CEST 2014

At 2014-09-10 17:49:21,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:

---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Wed, Sep 10, 2014 at 12:14 PM
Subject: Re: [x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>

At 2014-09-10 09:34:31,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:

---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Tue, Sep 9, 2014 at 10:17 AM
Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>

>>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?
what about vinserti128 ?

>>you want to use vinserti128 combin 128bits to 256 bits, is it more cost than two of movu 

I tested both sse and avx2 code on HASWELL-I5 machine,  avx2 code seems a bit faster so, I think we should keep both versions. Here is result of 3 runs:

SSE VERSION:-
         copy_cnt[4x4]  4.21x    110.16          463.86

        copy_cnt[4x4]  4.18x    104.64          437.08
        copy_cnt[4x4]  4.17x    110.23          460.02

AVX2 VERSION:-
        copy_cnt[4x4]  4.71x    99.23           467.63
        copy_cnt[4x4]  4.39x    104.46          458.58
        copy_cnt[4x4]  4.71x    99.27           467.91

No any improve, it is random deviation, so I like keep old SSE version

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140911/0a6d4d51/attachment.html>