[x265] Fwd: Fwd: [PATCH] copy_cnt_4: faster AVX2 code
chen
chenm003 at 163.com
Wed Sep 10 23:22:38 CEST 2014
At 2014-09-10 17:49:21,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:
---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Wed, Sep 10, 2014 at 12:14 PM
Subject: Re: [x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>
At 2014-09-10 09:34:31,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:
---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Tue, Sep 9, 2014 at 10:17 AM
Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>
>>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?
what about vinserti128 ?
>>you want to use vinserti128 combin 128bits to 256 bits, is it more cost than two of movu
I tested both sse and avx2 code on HASWELL-I5 machine, avx2 code seems a bit faster so, I think we should keep both versions. Here is result of 3 runs:
SSE VERSION:-
copy_cnt[4x4] 4.21x 110.16 463.86
copy_cnt[4x4] 4.18x 104.64 437.08
copy_cnt[4x4] 4.17x 110.23 460.02
AVX2 VERSION:-
copy_cnt[4x4] 4.71x 99.23 467.63
copy_cnt[4x4] 4.39x 104.46 458.58
copy_cnt[4x4] 4.71x 99.27 467.91
No any improve, it is random deviation, so I like keep old SSE version
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140911/0a6d4d51/attachment.html>
More information about the x265-devel
mailing list