[x265-commits] [x265] ratecontrol parameters: add documentation for qcomp
Deepthi Nandakumar
deepthi at multicorewareinc.com
Fri Dec 6 19:57:55 CET 2013
details: http://hg.videolan.org/x265/rev/608874dc84ab
branches:
changeset: 5554:608874dc84ab
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Dec 06 15:26:30 2013 +0530
description:
ratecontrol parameters: add documentation for qcomp
Subject: [x265] ratecontrol params: documentation for rateTolerance
details: http://hg.videolan.org/x265/rev/49288da0ee3e
branches:
changeset: 5555:49288da0ee3e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Dec 06 15:37:06 2013 +0530
description:
ratecontrol params: documentation for rateTolerance
Subject: [x265] rc params: documentation on i/p/bfactor, qpstep, crf
details: http://hg.videolan.org/x265/rev/c5e91abfeb05
branches:
changeset: 5556:c5e91abfeb05
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Dec 06 16:39:29 2013 +0530
description:
rc params: documentation on i/p/bfactor, qpstep, crf
Subject: [x265] x265: remove obsolete R-D enums
details: http://hg.videolan.org/x265/rev/56a17500909e
branches:
changeset: 5557:56a17500909e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Dec 06 16:41:00 2013 +0530
description:
x265: remove obsolete R-D enums
Subject: [x265] asm: cleanup garbage after fucntion declare
details: http://hg.videolan.org/x265/rev/ad08db06a7c6
branches:
changeset: 5558:ad08db06a7c6
user: Min Chen <chenm003 at 163.com>
date: Fri Dec 06 13:21:07 2013 +0800
description:
asm: cleanup garbage after fucntion declare
Subject: [x265] 16bpp: assembly code for intra_pred_dc4
details: http://hg.videolan.org/x265/rev/9e24dcae2ebf
branches:
changeset: 5559:9e24dcae2ebf
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Dec 06 14:32:54 2013 +0550
description:
16bpp: assembly code for intra_pred_dc4
Subject: [x265] 16bpp: assembly code for intra_pred_dc8
details: http://hg.videolan.org/x265/rev/110d716e67a7
branches:
changeset: 5560:110d716e67a7
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Dec 06 17:06:06 2013 +0550
description:
16bpp: assembly code for intra_pred_dc8
Subject: [x265] 16bpp: assembly code for intra_pred_dc16
details: http://hg.videolan.org/x265/rev/6d2d7c2a5d79
branches:
changeset: 5561:6d2d7c2a5d79
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Dec 06 17:33:57 2013 +0550
description:
16bpp: assembly code for intra_pred_dc16
Subject: [x265] 16bpp: assembly code for intra_pred_dc32
details: http://hg.videolan.org/x265/rev/d36fb6852698
branches:
changeset: 5562:d36fb6852698
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Dec 06 17:52:40 2013 +0550
description:
16bpp: assembly code for intra_pred_dc32
Subject: [x265] asm: 10bpp code of pixel_sub for 16xN, 24x32, 32xN,48x64 and 64xN
details: http://hg.videolan.org/x265/rev/f27fb7c2676a
branches:
changeset: 5563:f27fb7c2676a
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Dec 06 20:53:12 2013 +0550
description:
asm: 10bpp code of pixel_sub for 16xN, 24x32, 32xN,48x64 and 64xN
Subject: [x265] 10bpp: testbench code for pixel_add_ps
details: http://hg.videolan.org/x265/rev/13314db77cf8
branches:
changeset: 5564:13314db77cf8
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Dec 06 21:57:37 2013 +0550
description:
10bpp: testbench code for pixel_add_ps
Subject: [x265] asm: 10bpp code for pixel_add_ps_2xN
details: http://hg.videolan.org/x265/rev/967297338b27
branches:
changeset: 5565:967297338b27
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Dec 06 22:25:55 2013 +0550
description:
asm: 10bpp code for pixel_add_ps_2xN
Subject: [x265] rename IntraPred.cpp to intrapred.cpp
details: http://hg.videolan.org/x265/rev/f4166f824c2b
branches:
changeset: 5566:f4166f824c2b
user: Min Chen <chenm003 at 163.com>
date: Fri Dec 06 23:25:32 2013 +0800
description:
rename IntraPred.cpp to intrapred.cpp
Subject: [x265] cleanup: merge Intra Pred PLANAR mode into intra_pred[]
details: http://hg.videolan.org/x265/rev/c093e7847025
branches:
changeset: 5567:c093e7847025
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Dec 06 22:41:10 2013 +0550
description:
cleanup: merge Intra Pred PLANAR mode into intra_pred[]
Subject: [x265] Merge
details: http://hg.videolan.org/x265/rev/a482cf5de173
branches:
changeset: 5568:a482cf5de173
user: Steve Borho <steve at borho.org>
date: Fri Dec 06 12:57:17 2013 -0600
description:
Merge
diffstat:
source/Lib/TLibCommon/TComPrediction.cpp | 4 +-
source/Lib/TLibEncoder/TEncSearch.cpp | 2 +-
source/common/CMakeLists.txt | 6 +-
source/common/intrapred.cpp | 15 +-
source/common/primitives.h | 1 -
source/common/x86/asm-primitives.cpp | 63 ++-
source/common/x86/const-a.asm | 1 +
source/common/x86/intrapred.h | 8 +-
source/common/x86/intrapred16.asm | 400 +++++++++++++++++++++++++
source/common/x86/intrapred8.asm | 128 ++++---
source/common/x86/pixel-util.h | 1 +
source/common/x86/pixel-util8.asm | 489 +++++++++++++++++++++++-------
source/common/x86/pixeladd8.asm | 82 +++++-
source/encoder/compress.cpp | 2 +-
source/encoder/slicetype.cpp | 2 +-
source/test/intrapredharness.cpp | 22 +-
source/test/intrapredharness.h | 2 +-
source/test/pixelharness.cpp | 21 +-
source/x265.h | 24 +-
19 files changed, 1026 insertions(+), 247 deletions(-)
diffs (truncated from 2003 to 300 lines):
diff -r d5dc48e6cd16 -r a482cf5de173 source/Lib/TLibCommon/TComPrediction.cpp
--- a/source/Lib/TLibCommon/TComPrediction.cpp Thu Dec 05 22:46:25 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp Fri Dec 06 12:57:17 2013 -0600
@@ -159,7 +159,7 @@ void TComPrediction::predIntraLumaAng(ui
// Create the prediction
if (dirMode == PLANAR_IDX)
{
- primitives.intra_pred_planar[log2BlkSize - 2](refAbv + 1, refLft + 1, dst, stride);
+ primitives.intra_pred[log2BlkSize - 2][PLANAR_IDX](dst, stride, refLft, refAbv, dirMode, 0);
}
else
{
@@ -186,7 +186,7 @@ void TComPrediction::predIntraChromaAng(
// get starting pixel in block
if (dirMode == PLANAR_IDX)
{
- primitives.intra_pred_planar[log2BlkSize](refAbv + width - 1 + 1, refLft + width - 1 + 1, dst, stride);
+ primitives.intra_pred[log2BlkSize][dirMode](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
}
else
{
diff -r d5dc48e6cd16 -r a482cf5de173 source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp Thu Dec 05 22:46:25 2013 -0600
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp Fri Dec 06 12:57:17 2013 -0600
@@ -1634,7 +1634,7 @@ void TEncSearch::estIntraPredQT(TComData
}
// PLANAR
- primitives.intra_pred_planar[log2SizeMinus2](abovePlanar + 1, leftPlanar + 1, tmp, scaleStride);
+ primitives.intra_pred[log2SizeMinus2][PLANAR_IDX](tmp, scaleStride,leftPlanar, abovePlanar, 0, 0);
modeCosts[PLANAR_IDX] = costMultiplier * sa8d(fenc, scaleStride, tmp, scaleStride);
// Transpose NxN
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/CMakeLists.txt Fri Dec 06 12:57:17 2013 -0600
@@ -109,12 +109,12 @@ source_group(Intrinsics FILES ${VEC_PRIM
if(ENABLE_ASSEMBLY)
set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h)
set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm
- mc-a2.asm ipfilter8.asm pixel-util8.asm blockcopy8.asm intrapred8.asm
+ mc-a2.asm ipfilter8.asm pixel-util8.asm blockcopy8.asm
pixeladd8.asm dct8.asm)
if(HIGH_BIT_DEPTH)
- set(A_SRCS ${A_SRCS} sad16-a.asm)
+ set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm)
else()
- set(A_SRCS ${A_SRCS} sad-a.asm)
+ set(A_SRCS ${A_SRCS} sad-a.asm intrapred8.asm)
endif()
if (NOT X64)
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/intrapred.cpp
--- a/source/common/intrapred.cpp Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/intrapred.cpp Fri Dec 06 12:57:17 2013 -0600
@@ -102,8 +102,10 @@ void intra_pred_dc_c(pixel* dst, intptr_
}
template<int width>
-void planad_pred_c(pixel* above, pixel* left, pixel* dst, intptr_t dstStride)
+void planad_pred_c(pixel* dst, intptr_t dstStride, pixel* left, pixel* above, int /*dirMode*/, int /*bFilter*/)
{
+ above += 1;
+ left += 1;
int k, l;
pixel bottomLeft, topRight;
int horPred;
@@ -293,13 +295,10 @@ namespace x265 {
void Setup_C_IPredPrimitives(EncoderPrimitives& p)
{
- p.intra_pred_planar[BLOCK_4x4] = planad_pred_c<4>;
- p.intra_pred_planar[BLOCK_8x8] = planad_pred_c<8>;
- p.intra_pred_planar[BLOCK_16x16] = planad_pred_c<16>;
- p.intra_pred_planar[BLOCK_32x32] = planad_pred_c<32>;
-
- // TODO: Fill Planar mode
- p.intra_pred[BLOCK_4x4][0] = NULL;
+ p.intra_pred[BLOCK_4x4][0] = planad_pred_c<4>;
+ p.intra_pred[BLOCK_8x8][0] = planad_pred_c<8>;
+ p.intra_pred[BLOCK_16x16][0] = planad_pred_c<16>;
+ p.intra_pred[BLOCK_32x32][0] = planad_pred_c<32>;
// Intra Prediction DC
p.intra_pred[BLOCK_4x4][1] = intra_pred_dc_c<4>;
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/primitives.h
--- a/source/common/primitives.h Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/primitives.h Fri Dec 06 12:57:17 2013 -0600
@@ -250,7 +250,6 @@ struct EncoderPrimitives
pixeladd_ss_t pixeladd_ss;
pixelavg_pp_t pixelavg_pp[NUM_LUMA_PARTITIONS];
- intra_planar_t intra_pred_planar[NUM_SQUARE_BLOCKS-1]; // no 64x64 intra predictions
intra_pred_t intra_pred[NUM_SQUARE_BLOCKS - 1][NUM_INTRA_MODE];
intra_allangs_t intra_pred_allangs[NUM_SQUARE_BLOCKS-1];
scale_t scale1D_128to64;
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/x86/asm-primitives.cpp Fri Dec 06 12:57:17 2013 -0600
@@ -300,9 +300,11 @@ extern "C" {
p.luma_vpp[LUMA_ ## W ## x ## H] = x265_interp_8tap_vert_pp_ ## W ## x ## H ## cpu; \
p.luma_vps[LUMA_ ## W ## x ## H] = x265_interp_8tap_vert_ps_ ## W ## x ## H ## cpu; \
p.luma_copy_ps[LUMA_ ## W ## x ## H] = x265_blockcopy_ps_ ## W ## x ## H ## cpu; \
- p.luma_sub_ps[LUMA_ ## W ## x ## H] = x265_pixel_sub_ps_ ## W ## x ## H ## cpu; \
p.luma_add_ps[LUMA_ ## W ## x ## H] = x265_pixel_add_ps_ ## W ## x ## H ## cpu;
+#define SETUP_LUMA_SUB_FUNC_DEF(W, H, cpu) \
+ p.luma_sub_ps[LUMA_ ## W ## x ## H] = x265_pixel_sub_ps_ ## W ## x ## H ## cpu;
+
#define SETUP_LUMA_SP_FUNC_DEF(W, H, cpu) \
p.luma_vsp[LUMA_ ## W ## x ## H] = x265_interp_8tap_vert_sp_ ## W ## x ## H ## cpu;
@@ -398,6 +400,33 @@ extern "C" {
SETUP_LUMA_FUNC_DEF(64, 16, cpu); \
SETUP_LUMA_FUNC_DEF(16, 64, cpu);
+#define LUMA_PIXELSUB(cpu) \
+ SETUP_LUMA_SUB_FUNC_DEF(4, 4, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(8, 8, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(8, 4, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(4, 8, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 8, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(8, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 12, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(12, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 4, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(4, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(32, 32, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(32, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 32, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(32, 24, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(24, 32, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(32, 8, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(8, 32, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(64, 64, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(64, 32, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(32, 64, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(64, 48, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(48, 64, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(64, 16, cpu); \
+ SETUP_LUMA_SUB_FUNC_DEF(16, 64, cpu);
+
#define LUMA_SP_FILTERS(cpu) \
SETUP_LUMA_SP_FUNC_DEF(4, 4, cpu); \
SETUP_LUMA_SP_FUNC_DEF(8, 8, cpu); \
@@ -632,20 +661,11 @@ void Setup_Assembly_Primitives(EncoderPr
p.cvt32to16_shr = x265_cvt32to16_shr_sse2;
p.cvt16to32_shl = x265_cvt16to32_shl_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_4x8] = x265_pixel_sub_ps_2x4_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_4x16] = x265_pixel_sub_ps_2x8_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_8x4] = x265_pixel_sub_ps_4x2_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_8x8] = x265_pixel_sub_ps_4x4_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_8x16] = x265_pixel_sub_ps_4x8_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_8x32] = x265_pixel_sub_ps_4x16_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_12x16] = x265_pixel_sub_ps_6x8_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x4] = x265_pixel_sub_ps_8x2_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x8] = x265_pixel_sub_ps_8x4_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x12] = x265_pixel_sub_ps_8x6_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x16] = x265_pixel_sub_ps_8x8_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x32] = x265_pixel_sub_ps_8x16_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_16x64] = x265_pixel_sub_ps_8x32_sse2;
- p.chroma[X265_CSP_I420].sub_ps[LUMA_24x32] = x265_pixel_sub_ps_12x16_sse2;
+ CHROMA_PIXELSUB_PS(_sse2);
+ LUMA_PIXELSUB(_sse2);
+
+ p.chroma[X265_CSP_I420].add_ps[CHROMA_2x4] = x265_pixel_add_ps_2x4_sse2;
+ p.chroma[X265_CSP_I420].add_ps[CHROMA_2x8] = x265_pixel_add_ps_2x8_sse2;
}
if (cpuMask & X265_CPU_SSSE3)
{
@@ -654,6 +674,10 @@ void Setup_Assembly_Primitives(EncoderPr
}
if (cpuMask & X265_CPU_SSE4)
{
+ p.intra_pred[BLOCK_4x4][1] = x265_intra_pred_dc4_sse4;
+ p.intra_pred[BLOCK_8x8][1] = x265_intra_pred_dc8_sse4;
+ p.intra_pred[BLOCK_16x16][1] = x265_intra_pred_dc16_sse4;
+ p.intra_pred[BLOCK_32x32][1] = x265_intra_pred_dc32_sse4;
}
if (cpuMask & X265_CPU_XOP)
{
@@ -843,6 +867,7 @@ void Setup_Assembly_Primitives(EncoderPr
LUMA_SSE_SP(_sse4);
CHROMA_PIXELSUB_PS(_sse4);
+ LUMA_PIXELSUB(_sse4);
CHROMA_FILTERS(_sse4);
LUMA_FILTERS(_sse4);
@@ -864,10 +889,10 @@ void Setup_Assembly_Primitives(EncoderPr
p.dequant_normal = x265_dequant_normal_sse4;
p.weight_pp = x265_weight_pp_sse4;
p.weight_sp = x265_weight_sp_sse4;
- p.intra_pred_planar[BLOCK_4x4] = x265_intra_pred_planar4_sse4;
- p.intra_pred_planar[BLOCK_8x8] = x265_intra_pred_planar8_sse4;
- p.intra_pred_planar[BLOCK_16x16] = x265_intra_pred_planar16_sse4;
- p.intra_pred_planar[BLOCK_32x32] = x265_intra_pred_planar32_sse4;
+ p.intra_pred[BLOCK_4x4][0] = x265_intra_pred_planar4_sse4;
+ p.intra_pred[BLOCK_8x8][0] = x265_intra_pred_planar8_sse4;
+ p.intra_pred[BLOCK_16x16][0] = x265_intra_pred_planar16_sse4;
+ p.intra_pred[BLOCK_32x32][0] = x265_intra_pred_planar32_sse4;
p.intra_pred_allangs[BLOCK_4x4] = x265_all_angs_pred_4x4_sse4;
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/x86/const-a.asm Fri Dec 06 12:57:17 2013 -0600
@@ -36,6 +36,7 @@ const pw_16, times 16 dw 16
const pw_32, times 16 dw 32
const pw_512, times 16 dw 512
const pw_1024, times 16 dw 1024
+const pw_4096, times 16 dw 4096
const pw_00ff, times 16 dw 0x00ff
const pw_pixel_max,times 16 dw ((1 << BIT_DEPTH)-1)
const pd_1, times 8 dd 1
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Thu Dec 05 22:46:25 2013 -0600
+++ b/source/common/x86/intrapred.h Fri Dec 06 12:57:17 2013 -0600
@@ -31,10 +31,10 @@ void x265_intra_pred_dc8_sse4(pixel* dst
void x265_intra_pred_dc16_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
void x265_intra_pred_dc32_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
-void x265_intra_pred_planar4_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-void x265_intra_pred_planar8_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-void x265_intra_pred_planar16_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-void x265_intra_pred_planar32_sse4(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
+void x265_intra_pred_planar4_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
+void x265_intra_pred_planar8_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
+void x265_intra_pred_planar16_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
+void x265_intra_pred_planar32_sse4(pixel* dst, intptr_t dstStride, pixel* above, pixel* left, int, int filter);
#define DECL_ANG(bsize, mode, cpu) \
void x265_intra_pred_ang ## bsize ## _ ## mode ## _ ## cpu(pixel * dst, intptr_t dstStride, pixel * refLeft, pixel * refAbove, int dirMode, int bFilter);
diff -r d5dc48e6cd16 -r a482cf5de173 source/common/x86/intrapred16.asm
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/source/common/x86/intrapred16.asm Fri Dec 06 12:57:17 2013 -0600
@@ -0,0 +1,400 @@
+;*****************************************************************************
+;* Copyright (C) 2013 x265 project
+;*
+;* Authors: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
+;*
+;* This program is free software; you can redistribute it and/or modify
+;* it under the terms of the GNU General Public License as published by
+;* the Free Software Foundation; either version 2 of the License, or
+;* (at your option) any later version.
+;*
+;* This program is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;* GNU General Public License for more details.
+;*
+;* You should have received a copy of the GNU General Public License
+;* along with this program; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+;*
+;* This program is also available under a commercial proprietary license.
+;* For more information, contact us at licensing at multicorewareinc.com.
+;*****************************************************************************/
+
+%include "x86inc.asm"
+%include "x86util.asm"
+
+SECTION_RODATA 32
+
+SECTION .text
+
+cextern pw_1
+cextern pd_32
+cextern pw_4096
+
+
+;-------------------------------------------------------------------------------------------------------
+; void intra_pred_dc(pixel* dst, intptr_t dstStride, pixel* left, pixel* above, int dirMode, int filter)
+;-------------------------------------------------------------------------------------------------------
+INIT_XMM sse4
+cglobal intra_pred_dc4, 4,6,2
+ mov r4d, r5m
+ add r2, 2
+ add r3, 2
+
+ movh m0, [r3] ; sumAbove
+ movh m1, [r2] ; sumLeft
+
+ paddw m0, m1
+ pshufd m1, m0, 1
+ paddw m0, m1
+ phaddw m0, m0 ; m0 = sum
+
+ test r4d, r4d
+
+ pmulhrsw m0, [pw_4096] ; m0 = (sum + 4) / 8
+ movd r4d, m0 ; r4d = dc_val
+ movzx r4d, r4w
+ pshuflw m0, m0, 0 ; m0 = word [dc_val ...]
+
+ ; store DC 4x4
+ movh [r0], m0
More information about the x265-commits
mailing list