[x265-commits] [x265] vtune: add comma to prevent string concatenation - fixes ...
Steve Borho
steve at borho.org
Mon Jan 12 05:50:27 CET 2015
details: http://hg.videolan.org/x265/rev/1924c460d130
branches:
changeset: 9063:1924c460d130
user: Steve Borho <steve at borho.org>
date: Fri Jan 09 11:35:26 2015 +0530
description:
vtune: add comma to prevent string concatenation - fixes task profiling
Subject: [x265] Refactor EncoderPrimitives under common.
details: http://hg.videolan.org/x265/rev/0fb899cd8e1a
branches:
changeset: 9064:0fb899cd8e1a
user: Kevin Wu <kevin at multicorewareinc.com>
date: Thu Jan 08 15:23:38 2015 -0600
description:
Refactor EncoderPrimitives under common.
Subject: [x265] Refactor EncoderPrimitives under encoder.
details: http://hg.videolan.org/x265/rev/7f6f97778548
branches:
changeset: 9065:7f6f97778548
user: Kevin Wu <kevin at multicorewareinc.com>
date: Thu Jan 08 15:30:26 2015 -0600
description:
Refactor EncoderPrimitives under encoder.
Subject: [x265] Fix index to dct primitive when using dst.
details: http://hg.videolan.org/x265/rev/efa3c407bf30
branches:
changeset: 9066:efa3c407bf30
user: Kevin Wu <kevin at multicorewareinc.com>
date: Tue Jan 06 16:46:07 2015 -0600
description:
Fix index to dct primitive when using dst.
Use the dst4x4 or idst4x4 function pointers instead of indexing over the
EncoderPrimitives and calling dct/idct.
Subject: [x265] Refactor EncoderPrimitives under test.
details: http://hg.videolan.org/x265/rev/d3c403664833
branches:
changeset: 9067:d3c403664833
user: Kevin Wu <kevin at multicorewareinc.com>
date: Wed Jan 07 17:41:45 2015 -0600
description:
Refactor EncoderPrimitives under test.
Subject: [x265] test: Move dst/idst tests out of DctConf struct
details: http://hg.videolan.org/x265/rev/4e64bb0efa3a
branches:
changeset: 9068:4e64bb0efa3a
user: Kevin Wu <kevin at multicorewareinc.com>
date: Thu Jan 08 11:45:37 2015 -0600
description:
test: Move dst/idst tests out of DctConf struct
Subject: [x265] change data type in satd_4x4 for psyCost_ss
details: http://hg.videolan.org/x265/rev/5b95e1e639f7
branches:
changeset: 9069:5b95e1e639f7
user: Divya Manivannan <divya at multicorewareinc.com>
date: Fri Jan 09 13:09:39 2015 +0530
description:
change data type in satd_4x4 for psyCost_ss
Subject: [x265] add testbench for psyCost_ss and asm for psyCost_ss_4x4: improve 1989c->515c
details: http://hg.videolan.org/x265/rev/7c8b6c7edd0c
branches:
changeset: 9070:7c8b6c7edd0c
user: Divya Manivannan <divya at multicorewareinc.com>
date: Fri Jan 09 13:26:21 2015 +0530
description:
add testbench for psyCost_ss and asm for psyCost_ss_4x4: improve 1989c->515c
Subject: [x265] fix bug in sa8d_8x8 for psyCost_ss
details: http://hg.videolan.org/x265/rev/79566465a64f
branches:
changeset: 9071:79566465a64f
user: Divya Manivannan <divya at multicorewareinc.com>
date: Fri Jan 09 18:50:13 2015 +0530
description:
fix bug in sa8d_8x8 for psyCost_ss
Subject: [x265] primitives: white-space and comment cleanpus
details: http://hg.videolan.org/x265/rev/1ffff9157c0a
branches:
changeset: 9072:1ffff9157c0a
user: Steve Borho <steve at borho.org>
date: Fri Jan 09 19:43:12 2015 +0530
description:
primitives: white-space and comment cleanpus
Subject: [x265] primitives: move extendPicBorder funcdef to common.h
details: http://hg.videolan.org/x265/rev/4973575ee22d
branches:
changeset: 9073:4973575ee22d
user: Steve Borho <steve at borho.org>
date: Fri Jan 09 19:43:34 2015 +0530
description:
primitives: move extendPicBorder funcdef to common.h
Subject: [x265] intrapred: clarify angle/mode.
details: http://hg.videolan.org/x265/rev/17ce633add70
branches:
changeset: 9074:17ce633add70
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Sun Jan 11 18:53:37 2015 +0530
description:
intrapred: clarify angle/mode.
Fixes asm/no-asm mismatch introduced in e23f671d64d1
Subject: [x265] analysis: simplify inter analysis structure to share more inter analysis data
details: http://hg.videolan.org/x265/rev/1db4bd2df318
branches:
changeset: 9075:1db4bd2df318
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Wed Dec 24 10:34:59 2014 +0530
description:
analysis: simplify inter analysis structure to share more inter analysis data
Subject: [x265] analysis load/save: dump skip mode info for reuse
details: http://hg.videolan.org/x265/rev/7e4774b2aedd
branches:
changeset: 9076:7e4774b2aedd
user: Gopu Govindaswamy
date: Sun Jan 11 21:15:07 2015 +0530
description:
analysis load/save: dump skip mode info for reuse
Subject: [x265] Merge
details: http://hg.videolan.org/x265/rev/17de7ae8f654
branches:
changeset: 9077:17de7ae8f654
user: Steve Borho <steve at borho.org>
date: Mon Jan 12 10:07:51 2015 +0530
description:
Merge
diffstat:
source/common/common.h | 10 +-
source/common/dct.cpp | 28 +-
source/common/ipfilter.cpp | 50 +-
source/common/lowres.h | 4 +-
source/common/pixel.cpp | 541 ++++++++--------
source/common/predict.cpp | 52 +-
source/common/primitives.cpp | 106 +-
source/common/primitives.h | 176 ++--
source/common/quant.cpp | 34 +-
source/common/shortyuv.cpp | 18 +-
source/common/vec/dct-sse3.cpp | 6 +-
source/common/vec/dct-ssse3.cpp | 4 +-
source/common/x86/asm-primitives.cpp | 1092 +++++++++++++++++----------------
source/common/x86/pixel-a.asm | 154 ++++
source/common/x86/pixel.h | 1 +
source/common/yuv.cpp | 54 +-
source/encoder/analysis.cpp | 131 ++-
source/encoder/analysis.h | 3 +-
source/encoder/encoder.cpp | 38 +-
source/encoder/framefilter.cpp | 28 +-
source/encoder/motion.cpp | 42 +-
source/encoder/ratecontrol.cpp | 6 +-
source/encoder/rdcost.h | 4 +-
source/encoder/search.cpp | 134 ++--
source/encoder/slicetype.cpp | 14 +-
source/encoder/weightPrediction.cpp | 20 +-
source/profile/vtune/vtune.cpp | 2 +-
source/test/ipfilterharness.cpp | 104 +-
source/test/mbdstharness.cpp | 44 +-
source/test/pixelharness.cpp | 317 +++++----
source/test/pixelharness.h | 1 +
31 files changed, 1766 insertions(+), 1452 deletions(-)
diffs (truncated from 5630 to 300 lines):
diff -r 77938d3e3f09 -r 17de7ae8f654 source/common/common.h
--- a/source/common/common.h Fri Jan 09 11:02:16 2015 +0530
+++ b/source/common/common.h Mon Jan 12 10:07:51 2015 +0530
@@ -366,10 +366,12 @@ struct SAOParam
}
};
-/* Stores inter (motion estimation) analysis data for a single frame */
+/* Stores inter analysis data for a single frame */
struct analysis_inter_data
{
- int ref;
+ int32_t* ref;
+ uint8_t* depth;
+ uint8_t* modes;
};
/* Stores intra analysis data for a single frame. This struct needs better packing */
@@ -404,6 +406,10 @@ enum SignificanceMapContextType
CONTEXT_TYPE_NxN = 2,
CONTEXT_NUMBER_OF_TYPES = 3
};
+
+/* located in pixel.cpp */
+void extendPicBorder(pixel* recon, intptr_t stride, int width, int height, int marginX, int marginY);
+
}
/* outside x265 namespace, but prefixed. defined in common.cpp */
diff -r 77938d3e3f09 -r 17de7ae8f654 source/common/dct.cpp
--- a/source/common/dct.cpp Fri Jan 09 11:02:16 2015 +0530
+++ b/source/common/dct.cpp Mon Jan 12 10:07:51 2015 +0530
@@ -765,22 +765,22 @@ void Setup_C_DCTPrimitives(EncoderPrimit
p.dequant_normal = dequant_normal_c;
p.quant = quant_c;
p.nquant = nquant_c;
- p.dct[DST_4x4] = dst4_c;
- p.dct[DCT_4x4] = dct4_c;
- p.dct[DCT_8x8] = dct8_c;
- p.dct[DCT_16x16] = dct16_c;
- p.dct[DCT_32x32] = dct32_c;
- p.idct[IDST_4x4] = idst4_c;
- p.idct[IDCT_4x4] = idct4_c;
- p.idct[IDCT_8x8] = idct8_c;
- p.idct[IDCT_16x16] = idct16_c;
- p.idct[IDCT_32x32] = idct32_c;
+ p.dst4x4 = dst4_c;
+ p.cu[BLOCK_4x4].dct = dct4_c;
+ p.cu[BLOCK_8x8].dct = dct8_c;
+ p.cu[BLOCK_16x16].dct = dct16_c;
+ p.cu[BLOCK_32x32].dct = dct32_c;
+ p.idst4x4 = idst4_c;
+ p.cu[BLOCK_4x4].idct = idct4_c;
+ p.cu[BLOCK_8x8].idct = idct8_c;
+ p.cu[BLOCK_16x16].idct = idct16_c;
+ p.cu[BLOCK_32x32].idct = idct32_c;
p.count_nonzero = count_nonzero_c;
p.denoiseDct = denoiseDct_c;
- p.copy_cnt[BLOCK_4x4] = copy_count<4>;
- p.copy_cnt[BLOCK_8x8] = copy_count<8>;
- p.copy_cnt[BLOCK_16x16] = copy_count<16>;
- p.copy_cnt[BLOCK_32x32] = copy_count<32>;
+ p.cu[BLOCK_4x4].copy_cnt = copy_count<4>;
+ p.cu[BLOCK_8x8].copy_cnt = copy_count<8>;
+ p.cu[BLOCK_16x16].copy_cnt = copy_count<16>;
+ p.cu[BLOCK_32x32].copy_cnt = copy_count<32>;
}
}
diff -r 77938d3e3f09 -r 17de7ae8f654 source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp Fri Jan 09 11:02:16 2015 +0530
+++ b/source/common/ipfilter.cpp Mon Jan 12 10:07:51 2015 +0530
@@ -373,37 +373,37 @@ namespace x265 {
// x265 private namespace
#define CHROMA_420(W, H) \
- p.chroma[X265_CSP_I420].filter_hpp[CHROMA_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I420].filter_hps[CHROMA_ ## W ## x ## H] = interp_horiz_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I420].filter_vpp[CHROMA_ ## W ## x ## H] = interp_vert_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I420].filter_vps[CHROMA_ ## W ## x ## H] = interp_vert_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I420].filter_vsp[CHROMA_ ## W ## x ## H] = interp_vert_sp_c<4, W, H>; \
- p.chroma[X265_CSP_I420].filter_vss[CHROMA_ ## W ## x ## H] = interp_vert_ss_c<4, W, H>;
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_hps = interp_horiz_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_vpp = interp_vert_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
+ p.chroma[X265_CSP_I420].pu[CHROMA_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>;
#define CHROMA_422(W, H) \
- p.chroma[X265_CSP_I422].filter_hpp[CHROMA422_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I422].filter_hps[CHROMA422_ ## W ## x ## H] = interp_horiz_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I422].filter_vpp[CHROMA422_ ## W ## x ## H] = interp_vert_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I422].filter_vps[CHROMA422_ ## W ## x ## H] = interp_vert_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I422].filter_vsp[CHROMA422_ ## W ## x ## H] = interp_vert_sp_c<4, W, H>; \
- p.chroma[X265_CSP_I422].filter_vss[CHROMA422_ ## W ## x ## H] = interp_vert_ss_c<4, W, H>;
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_hps = interp_horiz_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_vpp = interp_vert_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].pu[CHROMA422_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>;
#define CHROMA_444(W, H) \
- p.chroma[X265_CSP_I444].filter_hpp[LUMA_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I444].filter_hps[LUMA_ ## W ## x ## H] = interp_horiz_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I444].filter_vpp[LUMA_ ## W ## x ## H] = interp_vert_pp_c<4, W, H>; \
- p.chroma[X265_CSP_I444].filter_vps[LUMA_ ## W ## x ## H] = interp_vert_ps_c<4, W, H>; \
- p.chroma[X265_CSP_I444].filter_vsp[LUMA_ ## W ## x ## H] = interp_vert_sp_c<4, W, H>; \
- p.chroma[X265_CSP_I444].filter_vss[LUMA_ ## W ## x ## H] = interp_vert_ss_c<4, W, H>;
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_hps = interp_horiz_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vpp = interp_vert_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>;
#define LUMA(W, H) \
- p.luma_hpp[LUMA_ ## W ## x ## H] = interp_horiz_pp_c<8, W, H>; \
- p.luma_hps[LUMA_ ## W ## x ## H] = interp_horiz_ps_c<8, W, H>; \
- p.luma_vpp[LUMA_ ## W ## x ## H] = interp_vert_pp_c<8, W, H>; \
- p.luma_vps[LUMA_ ## W ## x ## H] = interp_vert_ps_c<8, W, H>; \
- p.luma_vsp[LUMA_ ## W ## x ## H] = interp_vert_sp_c<8, W, H>; \
- p.luma_vss[LUMA_ ## W ## x ## H] = interp_vert_ss_c<8, W, H>; \
- p.luma_hvpp[LUMA_ ## W ## x ## H] = interp_hv_pp_c<8, W, H>;
+ p.pu[LUMA_ ## W ## x ## H].luma_hpp = interp_horiz_pp_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_hps = interp_horiz_ps_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_vpp = interp_vert_pp_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_vps = interp_vert_ps_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_vsp = interp_vert_sp_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_vss = interp_vert_ss_c<8, W, H>; \
+ p.pu[LUMA_ ## W ## x ## H].luma_hvpp = interp_hv_pp_c<8, W, H>;
void Setup_C_IPFilterPrimitives(EncoderPrimitives& p)
{
diff -r 77938d3e3f09 -r 17de7ae8f654 source/common/lowres.h
--- a/source/common/lowres.h Fri Jan 09 11:02:16 2015 +0530
+++ b/source/common/lowres.h Mon Jan 12 10:07:51 2015 +0530
@@ -69,7 +69,7 @@ struct ReferencePlanes
int qmvy = qmv.y + (qmv.y & 1);
int hpelB = (qmvy & 2) | ((qmvx & 2) >> 1);
pixel *frefB = lowresPlane[hpelB] + blockOffset + (qmvx >> 2) + (qmvy >> 2) * lumaStride;
- primitives.pixelavg_pp[LUMA_8x8](buf, outstride, frefA, lumaStride, frefB, lumaStride, 32);
+ primitives.pu[LUMA_8x8].pixelavg_pp(buf, outstride, frefA, lumaStride, frefB, lumaStride, 32);
return buf;
}
else
@@ -91,7 +91,7 @@ struct ReferencePlanes
int qmvy = qmv.y + (qmv.y & 1);
int hpelB = (qmvy & 2) | ((qmvx & 2) >> 1);
pixel *frefB = lowresPlane[hpelB] + blockOffset + (qmvx >> 2) + (qmvy >> 2) * lumaStride;
- primitives.pixelavg_pp[LUMA_8x8](subpelbuf, 8, frefA, lumaStride, frefB, lumaStride, 32);
+ primitives.pu[LUMA_8x8].pixelavg_pp(subpelbuf, 8, frefA, lumaStride, frefB, lumaStride, 32);
return comp(fenc, FENC_STRIDE, subpelbuf, 8);
}
else
diff -r 77938d3e3f09 -r 17de7ae8f654 source/common/pixel.cpp
--- a/source/common/pixel.cpp Fri Jan 09 11:02:16 2015 +0530
+++ b/source/common/pixel.cpp Mon Jan 12 10:07:51 2015 +0530
@@ -33,58 +33,58 @@
using namespace x265;
#define SET_FUNC_PRIMITIVE_TABLE_C(FUNC_PREFIX, FUNC_PREFIX_DEF, DATA_TYPE1, DATA_TYPE2) \
- p.FUNC_PREFIX[LUMA_4x4] = FUNC_PREFIX_DEF<4, 4, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_8x8] = FUNC_PREFIX_DEF<8, 8, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_8x4] = FUNC_PREFIX_DEF<8, 4, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_4x8] = FUNC_PREFIX_DEF<4, 8, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x16] = FUNC_PREFIX_DEF<16, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x8] = FUNC_PREFIX_DEF<16, 8, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_8x16] = FUNC_PREFIX_DEF<8, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x12] = FUNC_PREFIX_DEF<16, 12, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_12x16] = FUNC_PREFIX_DEF<12, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x4] = FUNC_PREFIX_DEF<16, 4, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_4x16] = FUNC_PREFIX_DEF<4, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_32x32] = FUNC_PREFIX_DEF<32, 32, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_32x16] = FUNC_PREFIX_DEF<32, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x32] = FUNC_PREFIX_DEF<16, 32, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_32x24] = FUNC_PREFIX_DEF<32, 24, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_24x32] = FUNC_PREFIX_DEF<24, 32, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_32x8] = FUNC_PREFIX_DEF<32, 8, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_8x32] = FUNC_PREFIX_DEF<8, 32, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_64x64] = FUNC_PREFIX_DEF<64, 64, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_64x32] = FUNC_PREFIX_DEF<64, 32, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_32x64] = FUNC_PREFIX_DEF<32, 64, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_64x48] = FUNC_PREFIX_DEF<64, 48, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_48x64] = FUNC_PREFIX_DEF<48, 64, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_64x16] = FUNC_PREFIX_DEF<64, 16, DATA_TYPE1, DATA_TYPE2>; \
- p.FUNC_PREFIX[LUMA_16x64] = FUNC_PREFIX_DEF<16, 64, DATA_TYPE1, DATA_TYPE2>;
+ p.pu[LUMA_4x4].FUNC_PREFIX = FUNC_PREFIX_DEF<4, 4, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_8x8].FUNC_PREFIX = FUNC_PREFIX_DEF<8, 8, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_8x4].FUNC_PREFIX = FUNC_PREFIX_DEF<8, 4, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_4x8].FUNC_PREFIX = FUNC_PREFIX_DEF<4, 8, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x16].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x8].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 8, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_8x16].FUNC_PREFIX = FUNC_PREFIX_DEF<8, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x12].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 12, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_12x16].FUNC_PREFIX = FUNC_PREFIX_DEF<12, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x4].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 4, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_4x16].FUNC_PREFIX = FUNC_PREFIX_DEF<4, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_32x32].FUNC_PREFIX = FUNC_PREFIX_DEF<32, 32, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_32x16].FUNC_PREFIX = FUNC_PREFIX_DEF<32, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x32].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 32, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_32x24].FUNC_PREFIX = FUNC_PREFIX_DEF<32, 24, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_24x32].FUNC_PREFIX = FUNC_PREFIX_DEF<24, 32, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_32x8].FUNC_PREFIX = FUNC_PREFIX_DEF<32, 8, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_8x32].FUNC_PREFIX = FUNC_PREFIX_DEF<8, 32, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_64x64].FUNC_PREFIX = FUNC_PREFIX_DEF<64, 64, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_64x32].FUNC_PREFIX = FUNC_PREFIX_DEF<64, 32, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_32x64].FUNC_PREFIX = FUNC_PREFIX_DEF<32, 64, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_64x48].FUNC_PREFIX = FUNC_PREFIX_DEF<64, 48, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_48x64].FUNC_PREFIX = FUNC_PREFIX_DEF<48, 64, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_64x16].FUNC_PREFIX = FUNC_PREFIX_DEF<64, 16, DATA_TYPE1, DATA_TYPE2>; \
+ p.pu[LUMA_16x64].FUNC_PREFIX = FUNC_PREFIX_DEF<16, 64, DATA_TYPE1, DATA_TYPE2>;
#define SET_FUNC_PRIMITIVE_TABLE_C2(FUNC_PREFIX) \
- p.FUNC_PREFIX[LUMA_4x4] = FUNC_PREFIX<4, 4>; \
- p.FUNC_PREFIX[LUMA_8x8] = FUNC_PREFIX<8, 8>; \
- p.FUNC_PREFIX[LUMA_8x4] = FUNC_PREFIX<8, 4>; \
- p.FUNC_PREFIX[LUMA_4x8] = FUNC_PREFIX<4, 8>; \
- p.FUNC_PREFIX[LUMA_16x16] = FUNC_PREFIX<16, 16>; \
- p.FUNC_PREFIX[LUMA_16x8] = FUNC_PREFIX<16, 8>; \
- p.FUNC_PREFIX[LUMA_8x16] = FUNC_PREFIX<8, 16>; \
- p.FUNC_PREFIX[LUMA_16x12] = FUNC_PREFIX<16, 12>; \
- p.FUNC_PREFIX[LUMA_12x16] = FUNC_PREFIX<12, 16>; \
- p.FUNC_PREFIX[LUMA_16x4] = FUNC_PREFIX<16, 4>; \
- p.FUNC_PREFIX[LUMA_4x16] = FUNC_PREFIX<4, 16>; \
- p.FUNC_PREFIX[LUMA_32x32] = FUNC_PREFIX<32, 32>; \
- p.FUNC_PREFIX[LUMA_32x16] = FUNC_PREFIX<32, 16>; \
- p.FUNC_PREFIX[LUMA_16x32] = FUNC_PREFIX<16, 32>; \
- p.FUNC_PREFIX[LUMA_32x24] = FUNC_PREFIX<32, 24>; \
- p.FUNC_PREFIX[LUMA_24x32] = FUNC_PREFIX<24, 32>; \
- p.FUNC_PREFIX[LUMA_32x8] = FUNC_PREFIX<32, 8>; \
- p.FUNC_PREFIX[LUMA_8x32] = FUNC_PREFIX<8, 32>; \
- p.FUNC_PREFIX[LUMA_64x64] = FUNC_PREFIX<64, 64>; \
- p.FUNC_PREFIX[LUMA_64x32] = FUNC_PREFIX<64, 32>; \
- p.FUNC_PREFIX[LUMA_32x64] = FUNC_PREFIX<32, 64>; \
- p.FUNC_PREFIX[LUMA_64x48] = FUNC_PREFIX<64, 48>; \
- p.FUNC_PREFIX[LUMA_48x64] = FUNC_PREFIX<48, 64>; \
- p.FUNC_PREFIX[LUMA_64x16] = FUNC_PREFIX<64, 16>; \
- p.FUNC_PREFIX[LUMA_16x64] = FUNC_PREFIX<16, 64>;
+ p.pu[LUMA_4x4].FUNC_PREFIX = FUNC_PREFIX<4, 4>; \
+ p.pu[LUMA_8x8].FUNC_PREFIX = FUNC_PREFIX<8, 8>; \
+ p.pu[LUMA_8x4].FUNC_PREFIX = FUNC_PREFIX<8, 4>; \
+ p.pu[LUMA_4x8].FUNC_PREFIX = FUNC_PREFIX<4, 8>; \
+ p.pu[LUMA_16x16].FUNC_PREFIX = FUNC_PREFIX<16, 16>; \
+ p.pu[LUMA_16x8].FUNC_PREFIX = FUNC_PREFIX<16, 8>; \
+ p.pu[LUMA_8x16].FUNC_PREFIX = FUNC_PREFIX<8, 16>; \
+ p.pu[LUMA_16x12].FUNC_PREFIX = FUNC_PREFIX<16, 12>; \
+ p.pu[LUMA_12x16].FUNC_PREFIX = FUNC_PREFIX<12, 16>; \
+ p.pu[LUMA_16x4].FUNC_PREFIX = FUNC_PREFIX<16, 4>; \
+ p.pu[LUMA_4x16].FUNC_PREFIX = FUNC_PREFIX<4, 16>; \
+ p.pu[LUMA_32x32].FUNC_PREFIX = FUNC_PREFIX<32, 32>; \
+ p.pu[LUMA_32x16].FUNC_PREFIX = FUNC_PREFIX<32, 16>; \
+ p.pu[LUMA_16x32].FUNC_PREFIX = FUNC_PREFIX<16, 32>; \
+ p.pu[LUMA_32x24].FUNC_PREFIX = FUNC_PREFIX<32, 24>; \
+ p.pu[LUMA_24x32].FUNC_PREFIX = FUNC_PREFIX<24, 32>; \
+ p.pu[LUMA_32x8].FUNC_PREFIX = FUNC_PREFIX<32, 8>; \
+ p.pu[LUMA_8x32].FUNC_PREFIX = FUNC_PREFIX<8, 32>; \
+ p.pu[LUMA_64x64].FUNC_PREFIX = FUNC_PREFIX<64, 64>; \
+ p.pu[LUMA_64x32].FUNC_PREFIX = FUNC_PREFIX<64, 32>; \
+ p.pu[LUMA_32x64].FUNC_PREFIX = FUNC_PREFIX<32, 64>; \
+ p.pu[LUMA_64x48].FUNC_PREFIX = FUNC_PREFIX<64, 48>; \
+ p.pu[LUMA_48x64].FUNC_PREFIX = FUNC_PREFIX<48, 64>; \
+ p.pu[LUMA_64x16].FUNC_PREFIX = FUNC_PREFIX<64, 16>; \
+ p.pu[LUMA_16x64].FUNC_PREFIX = FUNC_PREFIX<16, 64>;
namespace {
// place functions in anonymous namespace (file static)
@@ -243,9 +243,9 @@ int satd_4x4(const pixel* pix1, intptr_t
static int satd_4x4(const int16_t* pix1, intptr_t stride_pix1)
{
- int64_t tmp[4][4];
- int64_t s01, s23, d01, d23;
- int64_t satd = 0;
+ int32_t tmp[4][4];
+ int32_t s01, s23, d01, d23;
+ int32_t satd = 0;
int d;
for (d = 0; d < 4; d++, pix1 += stride_pix1)
@@ -367,46 +367,55 @@ int sa8d_8x8(const pixel* pix1, intptr_t
return (int)((_sa8d_8x8(pix1, i_pix1, pix2, i_pix2) + 2) >> 2);
}
-inline int _sa8d_8x8(const int16_t* pix1, intptr_t i_pix1, const int16_t* pix2, intptr_t i_pix2)
+inline int _sa8d_8x8(const int16_t* pix1, intptr_t i_pix1)
{
- ssum2_t tmp[8][4];
- ssum2_t a0, a1, a2, a3, a4, a5, a6, a7, b0, b1, b2, b3;
- ssum2_t sum = 0;
+ int32_t tmp[8][8];
+ int32_t a0, a1, a2, a3, a4, a5, a6, a7;
+ int32_t sum = 0;
- for (int i = 0; i < 8; i++, pix1 += i_pix1, pix2 += i_pix2)
+ for (int i = 0; i < 8; i++, pix1 += i_pix1)
{
- a0 = pix1[0] - pix2[0];
- a1 = pix1[1] - pix2[1];
More information about the x265-commits
mailing list