[x265-commits] [x265] asm: code for scale2D_64to32 routine
Murugan Vairavel
murugan at multicorewareinc.com
Thu Nov 28 01:11:10 CET 2013
details: http://hg.videolan.org/x265/rev/78c171e33865
branches:
changeset: 5333:78c171e33865
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Mon Nov 18 19:19:30 2013 +0550
description:
asm: code for scale2D_64to32 routine
Subject: [x265] asm: fix the alignment issues occured in sse_ss
details: http://hg.videolan.org/x265/rev/9c60abb71cf6
branches:
changeset: 5334:9c60abb71cf6
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Wed Nov 27 13:40:51 2013 +0550
description:
asm: fix the alignment issues occured in sse_ss
Subject: [x265] asm : assembly code for intra_pred_planar[16x16]
details: http://hg.videolan.org/x265/rev/09b5e8f592ac
branches:
changeset: 5335:09b5e8f592ac
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Wed Nov 27 12:43:16 2013 +0550
description:
asm : assembly code for intra_pred_planar[16x16]
Subject: [x265] asm: code for pixel_var_32x32 and 64x64 blocks
details: http://hg.videolan.org/x265/rev/8846d37b3d9c
branches:
changeset: 5336:8846d37b3d9c
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 17:49:13 2013 +0550
description:
asm: code for pixel_var_32x32 and 64x64 blocks
Subject: [x265] asm: code for pixel_sse_sp_32xN
details: http://hg.videolan.org/x265/rev/aeb1c93c69d2
branches:
changeset: 5337:aeb1c93c69d2
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 18:29:23 2013 +0550
description:
asm: code for pixel_sse_sp_32xN
Subject: [x265] asm: code for pixel_sse_sp_48x64
details: http://hg.videolan.org/x265/rev/6051967b60cd
branches:
changeset: 5338:6051967b60cd
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 18:51:08 2013 +0550
description:
asm: code for pixel_sse_sp_48x64
Subject: [x265] asm: code for pixel_sse_sp_64xN
details: http://hg.videolan.org/x265/rev/248a56faff0a
branches:
changeset: 5339:248a56faff0a
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 18:53:05 2013 +0550
description:
asm: code for pixel_sse_sp_64xN
Subject: [x265] asm: pixel_sse_ss_24x32 assembly routine
details: http://hg.videolan.org/x265/rev/8edf6fa32a74
branches:
changeset: 5340:8edf6fa32a74
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Wed Nov 27 18:25:55 2013 +0550
description:
asm: pixel_sse_ss_24x32 assembly routine
Subject: [x265] asm: pixel_sse_ss_48x64 assembly routine
details: http://hg.videolan.org/x265/rev/45ce09834506
branches:
changeset: 5341:45ce09834506
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Wed Nov 27 19:02:56 2013 +0550
description:
asm: pixel_sse_ss_48x64 assembly routine
Subject: [x265] asm: pixel_sse_ss_64xN assembly routine
details: http://hg.videolan.org/x265/rev/bf7cf2555571
branches:
changeset: 5342:bf7cf2555571
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Wed Nov 27 19:39:28 2013 +0550
description:
asm: pixel_sse_ss_64xN assembly routine
Subject: [x265] Adding constant tables used for idct4 asm routine
details: http://hg.videolan.org/x265/rev/a49c0228e06e
branches:
changeset: 5343:a49c0228e06e
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Wed Nov 27 20:49:18 2013 +0550
description:
Adding constant tables used for idct4 asm routine
Subject: [x265] asm: Adding asm routine for idct4
details: http://hg.videolan.org/x265/rev/e463501f8a20
branches:
changeset: 5344:e463501f8a20
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Wed Nov 27 21:08:12 2013 +0550
description:
asm: Adding asm routine for idct4
Subject: [x265] Enable the idct4 asm routine.
details: http://hg.videolan.org/x265/rev/7dbe6495ebb8
branches:
changeset: 5345:7dbe6495ebb8
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Wed Nov 27 21:14:14 2013 +0550
description:
Enable the idct4 asm routine.
Subject: [x265] asm: code for pixel_sse_sp_8xN
details: http://hg.videolan.org/x265/rev/54ba57708276
branches:
changeset: 5346:54ba57708276
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 21:41:33 2013 +0550
description:
asm: code for pixel_sse_sp_8xN
Subject: [x265] asm: code for pixel_sse_sp_24x32
details: http://hg.videolan.org/x265/rev/cf5c2f982353
branches:
changeset: 5347:cf5c2f982353
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Wed Nov 27 21:42:49 2013 +0550
description:
asm: code for pixel_sse_sp_24x32
Subject: [x265] Merge
details: http://hg.videolan.org/x265/rev/ec904fab863a
branches:
changeset: 5348:ec904fab863a
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 16:14:27 2013 -0600
description:
Merge
Subject: [x265] intra: fix yasm warning about redefined macro
details: http://hg.videolan.org/x265/rev/04811b42aa6b
branches:
changeset: 5349:04811b42aa6b
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 16:46:17 2013 -0600
description:
intra: fix yasm warning about redefined macro
Subject: [x265] vec: remove scale2D_64to32, we have asm coverage
details: http://hg.videolan.org/x265/rev/c5efe0603b61
branches:
changeset: 5350:c5efe0603b61
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 16:47:48 2013 -0600
description:
vec: remove scale2D_64to32, we have asm coverage
Subject: [x265] vec: remove intra_pred_planar16_sse4, we have asm coverage
details: http://hg.videolan.org/x265/rev/892addcb1c94
branches:
changeset: 5351:892addcb1c94
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 16:48:47 2013 -0600
description:
vec: remove intra_pred_planar16_sse4, we have asm coverage
Subject: [x265] cmake: detect inttypes.h and use for uint64_t printfs
details: http://hg.videolan.org/x265/rev/e4baf53cefe8
branches: stable
changeset: 5352:e4baf53cefe8
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 17:53:09 2013 -0600
description:
cmake: detect inttypes.h and use for uint64_t printfs
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/949f85337789
branches:
changeset: 5353:949f85337789
user: Steve Borho <steve at borho.org>
date: Wed Nov 27 18:10:14 2013 -0600
description:
Merge with stable
diffstat:
source/common/pixel.cpp | 2 +
source/common/vec/dct-sse3.cpp | 1 -
source/common/vec/intra-sse41.cpp | 68 -
source/common/vec/pixel-ssse3.cpp | 42 -
source/common/x86/asm-primitives.cpp | 31 +-
source/common/x86/const-a.asm | 2 +
source/common/x86/dct8.asm | 91 ++
source/common/x86/dct8.h | 1 +
source/common/x86/intrapred.asm | 92 ++-
source/common/x86/intrapred.h | 1 +
source/common/x86/pixel-a.asm | 1249 +++++++++++++++++++++++++++++----
source/common/x86/pixel.h | 20 +-
source/encoder/CMakeLists.txt | 5 +
source/encoder/encoder.cpp | 16 +-
14 files changed, 1315 insertions(+), 306 deletions(-)
diffs (truncated from 1992 to 300 lines):
diff -r b09b6fa7e89a -r 949f85337789 source/common/pixel.cpp
--- a/source/common/pixel.cpp Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/pixel.cpp Wed Nov 27 18:10:14 2013 -0600
@@ -985,6 +985,8 @@ void Setup_C_PixelPrimitives(EncoderPrim
p.var[BLOCK_8x8] = pixel_var<8>;
p.var[BLOCK_16x16] = pixel_var<16>;
+ p.var[BLOCK_32x32] = pixel_var<32>;
+ p.var[BLOCK_64x64] = pixel_var<64>;
p.plane_copy_deinterleave_c = plane_copy_deinterleave_chroma;
}
}
diff -r b09b6fa7e89a -r 949f85337789 source/common/vec/dct-sse3.cpp
--- a/source/common/vec/dct-sse3.cpp Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/vec/dct-sse3.cpp Wed Nov 27 18:10:14 2013 -0600
@@ -1656,7 +1656,6 @@ namespace x265 {
void Setup_Vec_DCTPrimitives_sse3(EncoderPrimitives &p)
{
#if !HIGH_BIT_DEPTH
- p.idct[IDCT_4x4] = idct4;
p.idct[IDCT_8x8] = idct8;
p.idct[IDCT_16x16] = idct16;
p.idct[IDCT_32x32] = idct32;
diff -r b09b6fa7e89a -r 949f85337789 source/common/vec/intra-sse41.cpp
--- a/source/common/vec/intra-sse41.cpp Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/vec/intra-sse41.cpp Wed Nov 27 18:10:14 2013 -0600
@@ -54,73 +54,6 @@ static void initFileStaticVars()
v_multi_2Row = _mm_setr_epi16(1, 2, 3, 4, 1, 2, 3, 4);
}
-void intra_pred_planar16_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride)
-{
- pixel bottomLeft, topRight;
- __m128i v_topRow[2];
- __m128i v_bottomRow[2];
-
- // Get left and above reference column and row
- __m128i im0 = _mm_loadu_si128((__m128i*)above); // topRow
-
- v_topRow[0] = _mm_unpacklo_epi8(im0, _mm_setzero_si128());
- v_topRow[1] = _mm_unpackhi_epi8(im0, _mm_setzero_si128());
-
- // Prepare intermediate variables used in interpolation
- bottomLeft = left[16];
- topRight = above[16];
-
- __m128i v_bottomLeft = _mm_set1_epi16(bottomLeft);
-
- v_bottomRow[0] = _mm_sub_epi16(v_bottomLeft, v_topRow[0]);
- v_bottomRow[1] = _mm_sub_epi16(v_bottomLeft, v_topRow[1]);
-
- v_topRow[0] = _mm_slli_epi16(v_topRow[0], 4);
- v_topRow[1] = _mm_slli_epi16(v_topRow[1], 4);
-
- __m128i v_horPred, v_horPredN[2], v_rightColumnN[2];
- __m128i v_im4L, v_im4H;
- __m128i v_im5;
-
-#define COMP_PRED_PLANAR_ROW(Y) { \
- v_horPred = _mm_cvtsi32_si128((left[(Y)] << 4) + 16); \
- v_horPred = _mm_shufflelo_epi16(v_horPred, 0); \
- v_horPred = _mm_shuffle_epi32(v_horPred, 0); \
- __m128i _tmp = _mm_cvtsi32_si128(topRight - left[(Y)]); \
- _tmp = _mm_shufflelo_epi16(_tmp, 0); \
- _tmp = _mm_shuffle_epi32(_tmp, 0); \
- v_rightColumnN[0] = _mm_mullo_epi16(_tmp, v_multiL); \
- v_rightColumnN[1] = _mm_mullo_epi16(_tmp, v_multiH); \
- v_horPredN[0] = _mm_add_epi16(v_horPred, v_rightColumnN[0]); \
- v_horPredN[1] = _mm_add_epi16(v_horPred, v_rightColumnN[1]); \
- v_topRow[0] = _mm_add_epi16(v_topRow[0], v_bottomRow[0]); \
- v_topRow[1] = _mm_add_epi16(v_topRow[1], v_bottomRow[1]); \
- v_im4L = _mm_srai_epi16(_mm_add_epi16(v_horPredN[0], v_topRow[0]), 5); \
- v_im4H = _mm_srai_epi16(_mm_add_epi16(v_horPredN[1], v_topRow[1]), 5); \
- v_im5 = _mm_packus_epi16(v_im4L, v_im4H); \
- _mm_storeu_si128((__m128i*)&dst[(Y)*dstStride], v_im5); \
-}
-
- COMP_PRED_PLANAR_ROW(0)
- COMP_PRED_PLANAR_ROW(1)
- COMP_PRED_PLANAR_ROW(2)
- COMP_PRED_PLANAR_ROW(3)
- COMP_PRED_PLANAR_ROW(4)
- COMP_PRED_PLANAR_ROW(5)
- COMP_PRED_PLANAR_ROW(6)
- COMP_PRED_PLANAR_ROW(7)
- COMP_PRED_PLANAR_ROW(8)
- COMP_PRED_PLANAR_ROW(9)
- COMP_PRED_PLANAR_ROW(10)
- COMP_PRED_PLANAR_ROW(11)
- COMP_PRED_PLANAR_ROW(12)
- COMP_PRED_PLANAR_ROW(13)
- COMP_PRED_PLANAR_ROW(14)
- COMP_PRED_PLANAR_ROW(15)
-
-#undef COMP_PRED_PLANAR_ROW
-}
-
void intra_pred_planar32_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride)
{
pixel bottomLeft, topRight;
@@ -8390,7 +8323,6 @@ void Setup_Vec_IPredPrimitives_sse41(Enc
#else
initFileStaticVars();
- p.intra_pred_planar[BLOCK_16x16] = intra_pred_planar16_sse4;
p.intra_pred_planar[BLOCK_32x32] = intra_pred_planar32_sse4;
p.intra_pred_planar[BLOCK_64x64] = intra_pred_planar64_sse4;
diff -r b09b6fa7e89a -r 949f85337789 source/common/vec/pixel-ssse3.cpp
--- a/source/common/vec/pixel-ssse3.cpp Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/vec/pixel-ssse3.cpp Wed Nov 27 18:10:14 2013 -0600
@@ -49,53 +49,11 @@ void convert16to32_shl(int32_t *dst, int
}
}
}
-
-#if !HIGH_BIT_DEPTH
-void scale2D_64to32(pixel *dst, pixel *src, intptr_t stride)
-{
- int i;
- const __m128i c8_1 = _mm_set1_epi32(0x01010101);
- const __m128i c16_2 = _mm_set1_epi32(0x00020002);
-
- for (i = 0; i < 64; i += 2)
- {
- __m128i T00 = _mm_loadu_si128((__m128i*)&src[(i + 0) * stride + 0]);
- __m128i T01 = _mm_loadu_si128((__m128i*)&src[(i + 0) * stride + 16]);
- __m128i T02 = _mm_loadu_si128((__m128i*)&src[(i + 0) * stride + 32]);
- __m128i T03 = _mm_loadu_si128((__m128i*)&src[(i + 0) * stride + 48]);
- __m128i T10 = _mm_loadu_si128((__m128i*)&src[(i + 1) * stride + 0]);
- __m128i T11 = _mm_loadu_si128((__m128i*)&src[(i + 1) * stride + 16]);
- __m128i T12 = _mm_loadu_si128((__m128i*)&src[(i + 1) * stride + 32]);
- __m128i T13 = _mm_loadu_si128((__m128i*)&src[(i + 1) * stride + 48]);
-
- __m128i S00 = _mm_maddubs_epi16(T00, c8_1);
- __m128i S01 = _mm_maddubs_epi16(T01, c8_1);
- __m128i S02 = _mm_maddubs_epi16(T02, c8_1);
- __m128i S03 = _mm_maddubs_epi16(T03, c8_1);
- __m128i S10 = _mm_maddubs_epi16(T10, c8_1);
- __m128i S11 = _mm_maddubs_epi16(T11, c8_1);
- __m128i S12 = _mm_maddubs_epi16(T12, c8_1);
- __m128i S13 = _mm_maddubs_epi16(T13, c8_1);
-
- __m128i S20 = _mm_srli_epi16(_mm_add_epi16(_mm_add_epi16(S00, S10), c16_2), 2);
- __m128i S21 = _mm_srli_epi16(_mm_add_epi16(_mm_add_epi16(S01, S11), c16_2), 2);
- __m128i S22 = _mm_srli_epi16(_mm_add_epi16(_mm_add_epi16(S02, S12), c16_2), 2);
- __m128i S23 = _mm_srli_epi16(_mm_add_epi16(_mm_add_epi16(S03, S13), c16_2), 2);
-
- _mm_storeu_si128((__m128i*)&dst[(i >> 1) * 32 + 0], _mm_packus_epi16(S20, S21));
- _mm_storeu_si128((__m128i*)&dst[(i >> 1) * 32 + 16], _mm_packus_epi16(S22, S23));
- }
-}
-#endif // if !HIGH_BIT_DEPTH
}
namespace x265 {
void Setup_Vec_PixelPrimitives_ssse3(EncoderPrimitives &p)
{
p.cvt16to32_shl = convert16to32_shl;
-
-#if !HIGH_BIT_DEPTH
- p.scale2D_64to32 = scale2D_64to32;
-#endif
}
}
diff -r b09b6fa7e89a -r 949f85337789 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/x86/asm-primitives.cpp Wed Nov 27 18:10:14 2013 -0600
@@ -104,11 +104,17 @@ extern "C" {
p.sse_ss[LUMA_16x16] = x265_pixel_ssd_ss_16x16_ ## cpu; \
p.sse_ss[LUMA_16x32] = x265_pixel_ssd_ss_16x32_ ## cpu; \
p.sse_ss[LUMA_16x64] = x265_pixel_ssd_ss_16x64_ ## cpu; \
+ p.sse_ss[LUMA_24x32] = x265_pixel_ssd_ss_24x32_ ## cpu; \
p.sse_ss[LUMA_32x8] = x265_pixel_ssd_ss_32x8_ ## cpu; \
p.sse_ss[LUMA_32x16] = x265_pixel_ssd_ss_32x16_ ## cpu; \
p.sse_ss[LUMA_32x24] = x265_pixel_ssd_ss_32x24_ ## cpu; \
p.sse_ss[LUMA_32x32] = x265_pixel_ssd_ss_32x32_ ## cpu; \
- p.sse_ss[LUMA_32x64] = x265_pixel_ssd_ss_32x64_ ## cpu;
+ p.sse_ss[LUMA_32x64] = x265_pixel_ssd_ss_32x64_ ## cpu; \
+ p.sse_ss[LUMA_48x64] = x265_pixel_ssd_ss_48x64_ ## cpu; \
+ p.sse_ss[LUMA_64x16] = x265_pixel_ssd_ss_64x16_ ## cpu; \
+ p.sse_ss[LUMA_64x32] = x265_pixel_ssd_ss_64x32_ ## cpu; \
+ p.sse_ss[LUMA_64x48] = x265_pixel_ssd_ss_64x48_ ## cpu; \
+ p.sse_ss[LUMA_64x64] = x265_pixel_ssd_ss_64x64_ ## cpu;
#define SA8D_INTER_FROM_BLOCK(cpu) \
p.sa8d_inter[LUMA_4x8] = x265_pixel_satd_4x8_ ## cpu; \
@@ -440,7 +446,9 @@ extern "C" {
#define LUMA_VAR(cpu) \
SETUP_PIXEL_VAR_DEF(8, 8, cpu); \
- SETUP_PIXEL_VAR_DEF(16, 16, cpu);
+ SETUP_PIXEL_VAR_DEF(16, 16, cpu); \
+ SETUP_PIXEL_VAR_DEF(32, 32, cpu); \
+ SETUP_PIXEL_VAR_DEF(64, 64, cpu);
namespace x265 {
// private x265 namespace
@@ -496,7 +504,6 @@ void Setup_Assembly_Primitives(EncoderPr
p.sad[LUMA_12x16] = x265_pixel_sad_12x16_sse2;
ASSGN_SSE(sse2);
- ASSGN_SSE_SS(sse2);
INIT2(sad, _sse2);
INIT2(sad_x3, _sse2);
INIT2(sad_x4, _sse2);
@@ -564,6 +571,7 @@ void Setup_Assembly_Primitives(EncoderPr
p.ssim_4x4x2_core = x265_pixel_ssim_4x4x2_core_sse2;
p.ssim_end_4 = x265_pixel_ssim_end4_sse2;
p.dct[DCT_4x4] = x265_dct4_sse2;
+ p.idct[IDCT_4x4] = x265_idct4_sse2;
}
if (cpuMask & X265_CPU_SSSE3)
{
@@ -575,6 +583,7 @@ void Setup_Assembly_Primitives(EncoderPr
PIXEL_AVG_W4(ssse3);
p.scale1D_128to64 = x265_scale1D_128to64_ssse3;
+ p.scale2D_64to32 = x265_scale2D_64to32_ssse3;
p.sad_x4[LUMA_8x4] = x265_pixel_sad_x4_8x4_ssse3;
p.sad_x4[LUMA_8x8] = x265_pixel_sad_x4_8x8_ssse3;
@@ -639,12 +648,27 @@ void Setup_Assembly_Primitives(EncoderPr
p.sse_pp[LUMA_64x48] = x265_pixel_ssd_64x48_sse4;
p.sse_pp[LUMA_64x64] = x265_pixel_ssd_64x64_sse4;
+ p.sse_sp[LUMA_8x4] = x265_pixel_ssd_sp_8x4_sse4;
+ p.sse_sp[LUMA_8x8] = x265_pixel_ssd_sp_8x8_sse4;
+ p.sse_sp[LUMA_8x16] = x265_pixel_ssd_sp_8x16_sse4;
+ p.sse_sp[LUMA_8x32] = x265_pixel_ssd_sp_8x32_sse4;
p.sse_sp[LUMA_16x4] = x265_pixel_ssd_sp_16x4_sse4;
p.sse_sp[LUMA_16x8] = x265_pixel_ssd_sp_16x8_sse4;
p.sse_sp[LUMA_16x12] = x265_pixel_ssd_sp_16x12_sse4;
p.sse_sp[LUMA_16x16] = x265_pixel_ssd_sp_16x16_sse4;
p.sse_sp[LUMA_16x32] = x265_pixel_ssd_sp_16x32_sse4;
p.sse_sp[LUMA_16x64] = x265_pixel_ssd_sp_16x64_sse4;
+ p.sse_sp[LUMA_24x32] = x265_pixel_ssd_sp_24x32_sse4;
+ p.sse_sp[LUMA_32x8] = x265_pixel_ssd_sp_32x8_sse4;
+ p.sse_sp[LUMA_32x16] = x265_pixel_ssd_sp_32x16_sse4;
+ p.sse_sp[LUMA_32x24] = x265_pixel_ssd_sp_32x24_sse4;
+ p.sse_sp[LUMA_32x32] = x265_pixel_ssd_sp_32x32_sse4;
+ p.sse_sp[LUMA_32x64] = x265_pixel_ssd_sp_32x64_sse4;
+ p.sse_sp[LUMA_48x64] = x265_pixel_ssd_sp_48x64_sse4;
+ p.sse_sp[LUMA_64x16] = x265_pixel_ssd_sp_64x16_sse4;
+ p.sse_sp[LUMA_64x32] = x265_pixel_ssd_sp_64x32_sse4;
+ p.sse_sp[LUMA_64x48] = x265_pixel_ssd_sp_64x48_sse4;
+ p.sse_sp[LUMA_64x64] = x265_pixel_ssd_sp_64x64_sse4;
CHROMA_PIXELSUB_PS(_sse4);
@@ -674,6 +698,7 @@ void Setup_Assembly_Primitives(EncoderPr
p.weight_sp = x265_weight_sp_sse4;
p.intra_pred_planar[BLOCK_4x4] = x265_intra_pred_planar4_sse4;
p.intra_pred_planar[BLOCK_8x8] = x265_intra_pred_planar8_sse4;
+ p.intra_pred_planar[BLOCK_16x16] = x265_intra_pred_planar16_sse4;
}
if (cpuMask & X265_CPU_AVX)
{
diff -r b09b6fa7e89a -r 949f85337789 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/x86/const-a.asm Wed Nov 27 18:10:14 2013 -0600
@@ -65,8 +65,10 @@ const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-
const pw_pmmpzzzz, dw 1,-1,-1,1,0,0,0,0
const pd_32, times 4 dd 32
+const pd_64, times 4 dd 64
const pd_128, times 4 dd 128
const pd_1024, times 4 dd 1024
+const pd_2048, times 4 dd 2048
const pd_ffff, times 4 dd 0xffff
const pw_ff00, times 8 dw 0xff00
diff -r b09b6fa7e89a -r 949f85337789 source/common/x86/dct8.asm
--- a/source/common/x86/dct8.asm Tue Nov 26 12:24:24 2013 -0600
+++ b/source/common/x86/dct8.asm Wed Nov 27 18:10:14 2013 -0600
@@ -21,6 +21,8 @@
;* For more information, contact us at licensing at multicorewareinc.com.
;*****************************************************************************/
+;TO-DO : Further optimize the routines.
+
%include "x86inc.asm"
%include "x86util.asm"
@@ -34,7 +36,9 @@ tab_dct4: times 4 dw 64, 64
SECTION .text
cextern pd_1
+cextern pd_64
cextern pd_128
+cextern pd_2048
;------------------------------------------------------
;void dct4(int16_t *src, int32_t *dst, intptr_t stride)
@@ -128,3 +132,90 @@ cglobal dct4, 3, 4, 8
movu [r1 + 3 * 16], m2
RET
+
+;-------------------------------------------------------
More information about the x265-commits
mailing list