[x265] [PATCH] filterHorizontal_p_p vector portion replaced with intrinsic code
Steve Borho
steve at borho.org
Mon Jul 29 19:09:35 CEST 2013
On Mon, Jul 29, 2013 at 5:18 AM, <praveen at multicorewareinc.com> wrote:
> # HG changeset patch
> # User praveentiwari
> # Date 1375093073 -19800
> # Node ID 5c64a7b6b636dbe9d64daad435047e1f29330406
> # Parent 9fb0dd3a7460acfeb55424a658ae1a40af12d85d
> filterHorizontal_p_p vector portion replaced with intrinsic code
>
> diff -r 9fb0dd3a7460 -r 5c64a7b6b636 source/common/vec/ipfilter8.inc
> --- a/source/common/vec/ipfilter8.inc Mon Jul 29 15:41:24 2013 +0530
> +++ b/source/common/vec/ipfilter8.inc Mon Jul 29 15:47:53 2013 +0530
> @@ -760,22 +760,16 @@
>
> for (; col < width; col++) // Remaining
> iterations
> {
> - Vec8s vec_sum_low, vec_zero(0);
> - Vec16uc vec_src0, vec_sum;
> - Vec8s vec_c;
> - vec_c.load(coeff);
> + __m128i NewSrc = _mm_loadl_epi64((__m128i*)(src + col));
> + __m128i T00 = _mm_maddubs_epi16(NewSrc, T10);
> + __m128i add = _mm_hadd_epi16(T00, T00);
> + short sum = _mm_extract_epi16(add, 0);
>
> if (N == 8)
> {
> - vec_src0.load(src + col);
> + add = _mm_hadd_epi16(add, add);
> + sum = _mm_extract_epi16(add, 0);
>
there were some extra spaces here that I removed before pushing
> }
> - else
> - {
> - vec_src0 = load_partial_by_i<4>(src + col);
> - }
> - // Assuming that there is no overflow (Everywhere in this
> function!)
> - vec_sum_low = extend_low(vec_src0) * vec_c;
> - short sum = horizontal_add(vec_sum_low);
> short val = (short)(sum + offset) >> headRoom;
> val = (val < 0) ? 0 : val;
> val = (val > maxVal) ? maxVal : val;
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> http://mailman.videolan.org/listinfo/x265-devel
>
--
Steve Borho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/private/x265-devel/attachments/20130729/13a712f6/attachment-0001.html>
More information about the x265-devel
mailing list