[x265] [PATCH] replace block_copy_p_p vector class function with intrinsic code

dnyaneshwar at multicorewareinc.com dnyaneshwar at multicorewareinc.com
Fri Oct 4 12:57:58 CEST 2013


# HG changeset patch
# User Dnyaneshwar
# Date 1380884222 -19800
#      Fri Oct 04 16:27:02 2013 +0530
# Node ID 69943bfd02a2feea711da586eb15c7ac77fa700d
# Parent  bf14f75b8cf99806c75cdc1a50b28b6cf265e3bd
replace block_copy_p_p vector class function with intrinsic code.
Performance is almost same as that of vector function.

diff -r bf14f75b8cf9 -r 69943bfd02a2 source/common/vec/blockcopy-sse3.cpp
--- a/source/common/vec/blockcopy-sse3.cpp	Fri Oct 04 01:39:22 2013 -0500
+++ b/source/common/vec/blockcopy-sse3.cpp	Fri Oct 04 16:27:02 2013 +0530
@@ -76,9 +76,8 @@
         {
             for (int x = 0; x < bx; x += 16)
             {
-                Vec16c word;
-                word.load_a(src + x);
-                word.store_a(dst + x);
+                __m128i word0 = _mm_load_si128((__m128i const*)(src + x)); // load block of 16 byte from src
+                _mm_store_si128((__m128i*)&dst[x], word0); // store block into dst
             }
 
             src += sstride;


More information about the x265-devel mailing list