[x265] Fwd: Fwd: [PATCH] copy_cnt_4: faster AVX2 code
Praveen Tiwari
praveen at multicorewareinc.com
Wed Sep 10 11:49:21 CEST 2014
---------- Forwarded message ----------
From: chen <chenm003 at 163.com>
Date: Wed, Sep 10, 2014 at 12:14 PM
Subject: Re: [x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>
At 2014-09-10 09:34:31,"Praveen Tiwari" <praveen at multicorewareinc.com>
wrote:
---------- Forwarded message ----------
From: chen <chenm003 at 163.com>
Date: Tue, Sep 9, 2014 at 10:17 AM
Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code
To: Development for x265 <x265-devel at videolan.org>
>>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?
what about vinserti128 ?
>>you want to use vinserti128 combin 128bits to 256 bits, is it more cost
than two of movu
I tested both sse and avx2 code on HASWELL-I5 machine, avx2 code seems a
bit faster so, I think we should keep both versions. Here is result of 3
runs:
*SSE VERSION:-*
copy_cnt[4x4] 4.21x 110.16 463.86
copy_cnt[4x4] 4.18x 104.64 437.08
copy_cnt[4x4] 4.17x 110.23 460.02
*AVX2 VERSION:-*
copy_cnt[4x4] 4.71x 99.23 467.63
copy_cnt[4x4] 4.39x 104.46 458.58
copy_cnt[4x4] 4.71x 99.27 467.91
_______________________________________________
x265-devel mailing list
x265-devel at videolan.org
https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140910/4d151dc3/attachment.html>
More information about the x265-devel
mailing list