[x265-commits] [x265] pixel: stop building 16x16, 16x8, and 8x16 intrinsic prim...
Steve Borho
steve at borho.org
Sun Oct 6 07:39:14 CEST 2013
details: http://hg.videolan.org/x265/rev/73f14d5ca8a9
branches:
changeset: 4228:73f14d5ca8a9
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 19:46:23 2013 -0500
description:
pixel: stop building 16x16, 16x8, and 8x16 intrinsic primitives
Subject: [x265] asm: quit instantiating functions which are not necessary
details: http://hg.videolan.org/x265/rev/5c27d330da43
branches:
changeset: 4229:5c27d330da43
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 20:35:45 2013 -0500
description:
asm: quit instantiating functions which are not necessary
Re-order functions for more clarity
Subject: [x265] pixelharness: report sad, sad_x3, and sad_x4 scores together
details: http://hg.videolan.org/x265/rev/4089b17f33ed
branches:
changeset: 4230:4089b17f33ed
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 20:37:27 2013 -0500
description:
pixelharness: report sad, sad_x3, and sad_x4 scores together
Subject: [x265] primitives: move small block sa8d_inter setup to primitives.cpp
details: http://hg.videolan.org/x265/rev/83ae910874e3
branches:
changeset: 4231:83ae910874e3
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 20:50:11 2013 -0500
description:
primitives: move small block sa8d_inter setup to primitives.cpp
This hack didn't belong in the assembly setup function
Subject: [x265] asm: use x265_pixel_satd_8x4_xop for p.satd[PARTITION_16x4] for 32 bit builds
details: http://hg.videolan.org/x265/rev/4f837e3ebd26
branches:
changeset: 4232:4f837e3ebd26
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:17:12 2013 -0500
description:
asm: use x265_pixel_satd_8x4_xop for p.satd[PARTITION_16x4] for 32 bit builds
On 64bit builds, we have native sse2 functions
Subject: [x265] primitives: setup square sa8d_inter function pointers from sa8d block pointers
details: http://hg.videolan.org/x265/rev/58bacc9ae3d1
branches:
changeset: 4233:58bacc9ae3d1
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:18:08 2013 -0500
description:
primitives: setup square sa8d_inter function pointers from sa8d block pointers
Subject: [x265] primitives: fixup 12x16 and 16x2 sa8d_inter pointers
details: http://hg.videolan.org/x265/rev/884016c98502
branches:
changeset: 4234:884016c98502
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:19:51 2013 -0500
description:
primitives: fixup 12x16 and 16x2 sa8d_inter pointers
32x12 isn't used but 12x16 and 16x12 are (for AMP)
Subject: [x265] primitives: fix off-by one initialization of primitives
details: http://hg.videolan.org/x265/rev/6e46fabdef40
branches:
changeset: 4235:6e46fabdef40
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:20:17 2013 -0500
description:
primitives: fix off-by one initialization of primitives
Subject: [x265] pixel: add back intrinsics for sad_x3_4x16 and sad_x4_4x16
details: http://hg.videolan.org/x265/rev/2e8d7b261880
branches:
changeset: 4236:2e8d7b261880
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:30:10 2013 -0500
description:
pixel: add back intrinsics for sad_x3_4x16 and sad_x4_4x16
These routines do not yet have assembly code
Subject: [x265] testbench: fix off-by one initialization of primitives
details: http://hg.videolan.org/x265/rev/e352d1f1a7c6
branches:
changeset: 4237:e352d1f1a7c6
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 21:45:17 2013 -0500
description:
testbench: fix off-by one initialization of primitives
Subject: [x265] asm: simplify generation of sa8d_inter functions from 8x8 and 16x16 blocks
details: http://hg.videolan.org/x265/rev/276f98fe1c59
branches:
changeset: 4238:276f98fe1c59
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:01:24 2013 -0500
description:
asm: simplify generation of sa8d_inter functions from 8x8 and 16x16 blocks
Subject: [x265] asm: cleanup the assignment of SSD primitives
details: http://hg.videolan.org/x265/rev/dc74d9932a3f
branches:
changeset: 4239:dc74d9932a3f
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:15:45 2013 -0500
description:
asm: cleanup the assignment of SSD primitives
Subject: [x265] pixel: drop SSE primitives that have assembly
details: http://hg.videolan.org/x265/rev/08b4bb1e5dbe
branches:
changeset: 4240:08b4bb1e5dbe
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:24:57 2013 -0500
description:
pixel: drop SSE primitives that have assembly
Subject: [x265] asm: don't build wrappers for functions with intrinsic implementations
details: http://hg.videolan.org/x265/rev/da37cd44a77c
branches:
changeset: 4241:da37cd44a77c
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:36:20 2013 -0500
description:
asm: don't build wrappers for functions with intrinsic implementations
Subject: [x265] pixel: add missing sse_pp_12x16, untemplatize others
details: http://hg.videolan.org/x265/rev/017aab1983dd
branches:
changeset: 4242:017aab1983dd
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:41:58 2013 -0500
description:
pixel: add missing sse_pp_12x16, untemplatize others
Subject: [x265] pixel: fix HIGH_BIT_DEPTH builds
details: http://hg.videolan.org/x265/rev/bc3d1a8ebc89
branches:
changeset: 4243:bc3d1a8ebc89
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 22:59:45 2013 -0500
description:
pixel: fix HIGH_BIT_DEPTH builds
Subject: [x265] pixel: simplify sad_16 to make it easier to maintain
details: http://hg.videolan.org/x265/rev/bf5852bbf75f
branches:
changeset: 4244:bf5852bbf75f
user: Steve Borho <steve at borho.org>
date: Sat Oct 05 23:30:38 2013 -0500
description:
pixel: simplify sad_16 to make it easier to maintain
Subject: [x265] pixel: simplify sad_x3_16 and sad_x4_16 to make them easier to maintain
details: http://hg.videolan.org/x265/rev/d27d01ffa4f0
branches:
changeset: 4245:d27d01ffa4f0
user: Steve Borho <steve at borho.org>
date: Sun Oct 06 00:07:09 2013 -0500
description:
pixel: simplify sad_x3_16 and sad_x4_16 to make them easier to maintain
Subject: [x265] asm: simplify setup of HEVC partitions for SATD primitives
details: http://hg.videolan.org/x265/rev/484d1d98710b
branches:
changeset: 4246:484d1d98710b
user: Steve Borho <steve at borho.org>
date: Sun Oct 06 00:36:00 2013 -0500
description:
asm: simplify setup of HEVC partitions for SATD primitives
Subject: [x265] pixel: fix eoln damage to pixel-avx2.cpp
details: http://hg.videolan.org/x265/rev/2190f2f036a1
branches:
changeset: 4247:2190f2f036a1
user: Steve Borho <steve at borho.org>
date: Sun Oct 06 00:36:24 2013 -0500
description:
pixel: fix eoln damage to pixel-avx2.cpp
diffstat:
source/common/primitives.cpp | 16 +-
source/common/vec/pixel-avx2.cpp | 28 +-
source/common/vec/pixel-sse41.cpp | 2898 ++++++++++++---------------------
source/common/x86/asm-primitives.cpp | 565 +-----
source/test/pixelharness.cpp | 24 +-
source/test/testbench.cpp | 2 +-
6 files changed, 1232 insertions(+), 2301 deletions(-)
diffs (truncated from 3813 to 300 lines):
diff -r 19b319c9a6aa -r 2190f2f036a1 source/common/primitives.cpp
--- a/source/common/primitives.cpp Sat Oct 05 19:43:50 2013 -0500
+++ b/source/common/primitives.cpp Sun Oct 06 00:36:24 2013 -0500
@@ -128,7 +128,7 @@ void x265_setup_primitives(x265_param_t
Setup_C_Primitives(primitives);
- for (int i = 2; i < cpuid; i++)
+ for (int i = 2; i <= cpuid; i++)
{
#if ENABLE_VECTOR_PRIMITIVES
Setup_Vector_Primitives(primitives, 1 << i);
@@ -138,6 +138,20 @@ void x265_setup_primitives(x265_param_t
#endif
}
+ primitives.sa8d_inter[PARTITION_8x8] = primitives.sa8d[BLOCK_8x8];
+ primitives.sa8d_inter[PARTITION_16x16] = primitives.sa8d[BLOCK_16x16];
+ primitives.sa8d_inter[PARTITION_32x32] = primitives.sa8d[BLOCK_32x32];
+ primitives.sa8d_inter[PARTITION_64x64] = primitives.sa8d[BLOCK_64x64];
+
+ // SA8D devolves to SATD for blocks not even multiples of 8x8
+ primitives.sa8d_inter[PARTITION_4x4] = primitives.satd[PARTITION_4x4];
+ primitives.sa8d_inter[PARTITION_4x8] = primitives.satd[PARTITION_4x8];
+ primitives.sa8d_inter[PARTITION_4x16] = primitives.satd[PARTITION_4x16];
+ primitives.sa8d_inter[PARTITION_8x4] = primitives.satd[PARTITION_8x4];
+ primitives.sa8d_inter[PARTITION_16x4] = primitives.satd[PARTITION_16x4];
+ primitives.sa8d_inter[PARTITION_16x12] = primitives.satd[PARTITION_16x12];
+ primitives.sa8d_inter[PARTITION_12x16] = primitives.satd[PARTITION_12x16];
+
#if ENABLE_VECTOR_PRIMITIVES
if (param->logLevel >= X265_LOG_INFO) fprintf(stderr, " intrinsic");
#endif
diff -r 19b319c9a6aa -r 2190f2f036a1 source/common/vec/pixel-avx2.cpp
--- a/source/common/vec/pixel-avx2.cpp Sat Oct 05 19:43:50 2013 -0500
+++ b/source/common/vec/pixel-avx2.cpp Sun Oct 06 00:36:24 2013 -0500
@@ -448,22 +448,22 @@ namespace x265 {
void Setup_Vec_PixelPrimitives_avx2(EncoderPrimitives &p)
{
p.sad[0] = p.sad[0];
-#define SET_SADS(W, H) \
- p.sad[PARTITION_##W##x##H] = sad_avx2_##W<H>; \
- p.sad_x3[PARTITION_##W##x##H] = sad_avx2_x3_##W<H>; \
- p.sad_x4[PARTITION_##W##x##H] = sad_avx2_x4_##W<H>; \
-
+#define SET_SADS(W, H) \
+ p.sad[PARTITION_##W##x##H] = sad_avx2_##W<H>; \
+ p.sad_x3[PARTITION_##W##x##H] = sad_avx2_x3_##W<H>; \
+ p.sad_x4[PARTITION_##W##x##H] = sad_avx2_x4_##W<H>; \
+
#if !HIGH_BIT_DEPTH
#if (defined(__GNUC__) || defined(__INTEL_COMPILER))
- SET_SADS(32, 8);
- SET_SADS(32, 16);
- SET_SADS(32, 24);
- SET_SADS(32, 32);
- SET_SADS(32, 64);
- SET_SADS(64, 16);
- SET_SADS(64, 32);
- SET_SADS(64, 48);
- SET_SADS(64, 64);
+ SET_SADS(32, 8);
+ SET_SADS(32, 16);
+ SET_SADS(32, 24);
+ SET_SADS(32, 32);
+ SET_SADS(32, 64);
+ SET_SADS(64, 16);
+ SET_SADS(64, 32);
+ SET_SADS(64, 48);
+ SET_SADS(64, 64);
#endif
#endif
}
diff -r 19b319c9a6aa -r 2190f2f036a1 source/common/vec/pixel-sse41.cpp
--- a/source/common/vec/pixel-sse41.cpp Sat Oct 05 19:43:50 2013 -0500
+++ b/source/common/vec/pixel-sse41.cpp Sun Oct 06 00:36:24 2013 -0500
@@ -334,228 +334,49 @@ int sad_12(pixel *fenc, intptr_t fencstr
template<int ly>
int sad_16(pixel * fenc, intptr_t fencstride, pixel * fref, intptr_t frefstride)
{
- assert((ly % 4) == 0);
-
__m128i sum0 = _mm_setzero_si128();
__m128i sum1 = _mm_setzero_si128();
__m128i T00, T01, T02, T03;
__m128i T10, T11, T12, T13;
__m128i T20, T21, T22, T23;
- if (ly == 4)
+#define PROCESS_16x4(BASE)\
+ T00 = _mm_load_si128((__m128i*)(fenc + (BASE + 0) * fencstride)); \
+ T01 = _mm_load_si128((__m128i*)(fenc + (BASE + 1) * fencstride)); \
+ T02 = _mm_load_si128((__m128i*)(fenc + (BASE + 2) * fencstride)); \
+ T03 = _mm_load_si128((__m128i*)(fenc + (BASE + 3) * fencstride)); \
+ T10 = _mm_loadu_si128((__m128i*)(fref + (BASE + 0) * frefstride)); \
+ T11 = _mm_loadu_si128((__m128i*)(fref + (BASE + 1) * frefstride)); \
+ T12 = _mm_loadu_si128((__m128i*)(fref + (BASE + 2) * frefstride)); \
+ T13 = _mm_loadu_si128((__m128i*)(fref + (BASE + 3) * frefstride)); \
+ T20 = _mm_sad_epu8(T00, T10); \
+ T21 = _mm_sad_epu8(T01, T11); \
+ T22 = _mm_sad_epu8(T02, T12); \
+ T23 = _mm_sad_epu8(T03, T13); \
+ sum0 = _mm_add_epi16(sum0, T20); \
+ sum0 = _mm_add_epi16(sum0, T21); \
+ sum0 = _mm_add_epi16(sum0, T22); \
+ sum0 = _mm_add_epi16(sum0, T23)
+
+ PROCESS_16x4(0);
+ if (ly >= 8)
{
- T00 = _mm_load_si128((__m128i*)(fenc + (0) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (1) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (2) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (3) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (0) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (1) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (2) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (3) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
+ PROCESS_16x4(4);
}
- else if (ly == 8)
+ if (ly >= 12)
{
- T00 = _mm_load_si128((__m128i*)(fenc + (0) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (1) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (2) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (3) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (0) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (1) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (2) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (3) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
-
- T00 = _mm_load_si128((__m128i*)(fenc + (4) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (5) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (6) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (7) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (4) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (5) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (6) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (7) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
+ PROCESS_16x4(8);
}
- else if (ly == 16)
+ if (ly >= 16)
{
- T00 = _mm_load_si128((__m128i*)(fenc + (0) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (1) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (2) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (3) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (0) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (1) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (2) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (3) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
-
- T00 = _mm_load_si128((__m128i*)(fenc + (4) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (5) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (6) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (7) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (4) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (5) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (6) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (7) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
-
- T00 = _mm_load_si128((__m128i*)(fenc + (8) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (9) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (10) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (11) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (8) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (9) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (10) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (11) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
-
- T00 = _mm_load_si128((__m128i*)(fenc + (12) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (13) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (14) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (15) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (12) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (13) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (14) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (15) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
+ PROCESS_16x4(12);
}
- else if ((ly % 8) == 0)
+ if (ly > 16)
{
- for (int i = 0; i < ly; i += 8)
+ for (int i = 16; i < ly; i += 8)
{
- T00 = _mm_load_si128((__m128i*)(fenc + (i + 0) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (i + 1) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (i + 2) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (i + 3) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (i + 0) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (i + 1) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (i + 2) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (i + 3) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi16(sum0, T20);
- sum0 = _mm_add_epi16(sum0, T21);
- sum0 = _mm_add_epi16(sum0, T22);
- sum0 = _mm_add_epi16(sum0, T23);
-
- T00 = _mm_load_si128((__m128i*)(fenc + (i + 4) * fencstride));
- T01 = _mm_load_si128((__m128i*)(fenc + (i + 5) * fencstride));
- T02 = _mm_load_si128((__m128i*)(fenc + (i + 6) * fencstride));
- T03 = _mm_load_si128((__m128i*)(fenc + (i + 7) * fencstride));
-
- T10 = _mm_loadu_si128((__m128i*)(fref + (i + 4) * frefstride));
- T11 = _mm_loadu_si128((__m128i*)(fref + (i + 5) * frefstride));
- T12 = _mm_loadu_si128((__m128i*)(fref + (i + 6) * frefstride));
- T13 = _mm_loadu_si128((__m128i*)(fref + (i + 7) * frefstride));
-
- T20 = _mm_sad_epu8(T00, T10);
- T21 = _mm_sad_epu8(T01, T11);
- T22 = _mm_sad_epu8(T02, T12);
- T23 = _mm_sad_epu8(T03, T13);
-
More information about the x265-commits
mailing list