[x265-commits] [x265] primitives: better document the data structures and their...
Steve Borho
steve at borho.org
Tue Jan 20 17:15:34 CET 2015
details: http://hg.videolan.org/x265/rev/17ac389a6400
branches:
changeset: 9177:17ac389a6400
user: Steve Borho <steve at borho.org>
date: Sun Jan 18 15:43:42 2015 +0530
description:
primitives: better document the data structures and their use
Subject: [x265] predict: disable conditional-expression-constant warnings
details: http://hg.videolan.org/x265/rev/bbc333bd4a62
branches:
changeset: 9178:bbc333bd4a62
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Jan 19 09:59:33 2015 +0530
description:
predict: disable conditional-expression-constant warnings
Subject: [x265] x265: update copyright header
details: http://hg.videolan.org/x265/rev/1ec53efeb07e
branches:
changeset: 9179:1ec53efeb07e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Jan 19 15:26:35 2015 +0530
description:
x265: update copyright header
Subject: [x265] asm: psyCost_ss_16x16 in sse4: improve 31052c->9946c
details: http://hg.videolan.org/x265/rev/2b2c656111ea
branches:
changeset: 9180:2b2c656111ea
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Jan 19 10:56:24 2015 +0530
description:
asm: psyCost_ss_16x16 in sse4: improve 31052c->9946c
Subject: [x265] asm: psyCost_ss_32x32 in sse4: improve 136848c->39754c
details: http://hg.videolan.org/x265/rev/5b38663a792a
branches:
changeset: 9181:5b38663a792a
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Jan 19 11:05:33 2015 +0530
description:
asm: psyCost_ss_32x32 in sse4: improve 136848c->39754c
Subject: [x265] asm: psyCost_ss_64x64 in sse4: improve 501123c->159906c
details: http://hg.videolan.org/x265/rev/c2048e0d9783
branches:
changeset: 9182:c2048e0d9783
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Jan 19 11:16:31 2015 +0530
description:
asm: psyCost_ss_64x64 in sse4: improve 501123c->159906c
Subject: [x265] asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode
details: http://hg.videolan.org/x265/rev/20381760757b
branches:
changeset: 9183:20381760757b
user: Min Chen <chenm003 at 163.com>
date: Mon Jan 19 18:21:45 2015 +0800
description:
asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode
Subject: [x265] asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode
details: http://hg.videolan.org/x265/rev/4f8b7cc9d51e
branches:
changeset: 9184:4f8b7cc9d51e
user: Min Chen <chenm003 at 163.com>
date: Mon Jan 19 18:21:50 2015 +0800
description:
asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode
Subject: [x265] avoid warning on variant correction in weight_sp_c()
details: http://hg.videolan.org/x265/rev/b49cb2d2c82f
branches:
changeset: 9185:b49cb2d2c82f
user: Min Chen <chenm003 at 163.com>
date: Tue Jan 20 01:19:23 2015 +0800
description:
avoid warning on variant correction in weight_sp_c()
Subject: [x265] asm: fix broken on weight_sp and weight_pp on 8bpp mode
details: http://hg.videolan.org/x265/rev/e331bf2b402d
branches:
changeset: 9186:e331bf2b402d
user: Min Chen <chenm003 at 163.com>
date: Tue Jan 20 01:33:51 2015 +0800
description:
asm: fix broken on weight_sp and weight_pp on 8bpp mode
Subject: [x265] asm: idct16 intrinsic 28900->25000 improvement over previous intrinsic
details: http://hg.videolan.org/x265/rev/6b72bb520a91
branches:
changeset: 9187:6b72bb520a91
user: David T Yuen <dtyx265 at gmail.com>
date: Mon Jan 19 09:43:36 2015 -0800
description:
asm: idct16 intrinsic 28900->25000 improvement over previous intrinsic
Subject: [x265] asm: remove obsolete comment
details: http://hg.videolan.org/x265/rev/3bc00d8dfce6
branches:
changeset: 9188:3bc00d8dfce6
user: Steve Borho <steve at borho.org>
date: Tue Jan 20 09:28:56 2015 -0600
description:
asm: remove obsolete comment
Subject: [x265] pixelharness: cleanup
details: http://hg.videolan.org/x265/rev/589eba98c46a
branches:
changeset: 9189:589eba98c46a
user: Steve Borho <steve at borho.org>
date: Tue Jan 20 09:35:06 2015 -0600
description:
pixelharness: cleanup
Subject: [x265] asm: cleanups
details: http://hg.videolan.org/x265/rev/8d470bbcfc9f
branches:
changeset: 9190:8d470bbcfc9f
user: Steve Borho <steve at borho.org>
date: Tue Jan 20 09:54:30 2015 -0600
description:
asm: cleanups
diffstat:
source/common/constants.cpp | 2 +-
source/common/constants.h | 2 +-
source/common/contexts.h | 2 +-
source/common/cudata.cpp | 2 +-
source/common/cudata.h | 2 +-
source/common/picyuv.cpp | 2 +-
source/common/picyuv.h | 2 +-
source/common/pixel.cpp | 9 +
source/common/predict.cpp | 4 +
source/common/primitives.h | 56 ++-
source/common/quant.cpp | 2 +-
source/common/quant.h | 2 +-
source/common/scalinglist.cpp | 2 +-
source/common/scalinglist.h | 2 +-
source/common/slice.cpp | 2 +-
source/common/slice.h | 2 +-
source/common/vec/dct-sse3.cpp | 612 ++++++++++++++++---------------
source/common/x86/asm-primitives.cpp | 27 +-
source/common/x86/const-a.asm | 1 +
source/common/x86/pixel-a.asm | 654 +++++++++++++++++++++++++++++++++++
source/common/x86/pixel-util8.asm | 171 ++++++++-
source/common/x86/pixel.h | 3 +
source/common/yuv.cpp | 2 +-
source/common/yuv.h | 2 +-
source/encoder/encoder.cpp | 2 +-
source/test/pixelharness.cpp | 183 +++++----
source/test/pixelharness.h | 2 +-
27 files changed, 1324 insertions(+), 430 deletions(-)
diffs (truncated from 2284 to 300 lines):
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/constants.cpp
--- a/source/common/constants.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/constants.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
-* Copyright (C) 2014 x265 project
+* Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/constants.h
--- a/source/common/constants.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/constants.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/contexts.h
--- a/source/common/contexts.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/contexts.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
-* Copyright (C) 2014 x265 project
+* Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/cudata.cpp
--- a/source/common/cudata.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/cudata.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/cudata.h
--- a/source/common/cudata.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/cudata.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/picyuv.cpp
--- a/source/common/picyuv.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/picyuv.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/picyuv.h
--- a/source/common/picyuv.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/picyuv.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/pixel.cpp
--- a/source/common/pixel.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/pixel.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -520,6 +520,15 @@ void weight_sp_c(const int16_t* src, pix
{
int x, y;
+#if CHECKED_BUILD || _DEBUG
+ const int correction = (IF_INTERNAL_PREC - X265_DEPTH);
+#endif
+
+ X265_CHECK(!((w0 << 6) > 32767), "w0 using more than 16 bits, asm output will mismatch\n");
+ X265_CHECK(!(round > 32767), "round using more than 16 bits, asm output will mismatch\n");
+ X265_CHECK((shift >= correction), "shift must be include factor correction, please update ASM ABI\n");
+ X265_CHECK(!(round & ((1 << correction) - 1)), "round must be include factor correction, please update ASM ABI\n");
+
for (y = 0; y <= height - 1; y++)
{
for (x = 0; x <= width - 1; )
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/predict.cpp
--- a/source/common/predict.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/predict.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -30,6 +30,10 @@
using namespace x265;
+#if _MSC_VER
+#pragma warning(disable: 4127) // conditional expression is constant
+#endif
+
namespace
{
inline pixel weightBidir(int w0, int16_t P0, int w1, int16_t P1, int round, int shift, int offset)
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/primitives.h
--- a/source/common/primitives.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/primitives.h Tue Jan 20 09:54:30 2015 -0600
@@ -38,7 +38,7 @@ namespace x265 {
enum LumaPU
{
- // Square (the first 5 PUs match the CU sizes)
+ // Square (the first 5 PUs match the block sizes)
LUMA_4x4, LUMA_8x8, LUMA_16x16, LUMA_32x32, LUMA_64x64,
// Rectangular
LUMA_8x4, LUMA_4x8,
@@ -65,9 +65,9 @@ enum LumaCU // can be indexed using log2
enum { NUM_TR_SIZE = 4 }; // TU are 4x4, 8x8, 16x16, and 32x32
-// Chroma partition sizes. These enums are just a convenience for indexing into the
-// chroma primitive arrays when instantiating templates. The chroma function tables should
-// always be indexed by the luma PU enum
+/* Chroma partition sizes. These enums are only a convenience for indexing into
+ * the chroma primitive arrays when instantiating macros or templates. The
+ * chroma function tables should always be indexed by a LumaPU enum when used. */
enum ChromaPU420
{
CHROMA_420_2x2, CHROMA_420_4x4, CHROMA_420_8x8, CHROMA_420_16x16, CHROMA_420_32x32,
@@ -182,20 +182,26 @@ typedef void (*cutree_propagate_cost) (i
* either an assembly routine, a SIMD intrinsic primitive, or a C function */
struct EncoderPrimitives
{
+ /* These primitives can be used for any sized prediction unit (from 4x4 to
+ * 64x64, square, rectangular - 50/50 or asymmetrical - 25/75) and are
+ * generally restricted to motion estimation and motion compensation (inter
+ * prediction. Note that the 4x4 PU can only be used for intra, which is
+ * really a 4x4 TU, so at most copy_pp and satd will use 4x4. This array is
+ * indexed by LumaPU values, which can be retrieved by partitionFromSizes() */
struct PU
{
- pixelcmp_t sad; // Sum of Absolute Differences
- pixelcmp_x3_t sad_x3; // Sum of Absolute Differences, 3 mv offsets at once
- pixelcmp_x4_t sad_x4; // Sum of Absolute Differences, 4 mv offsets at once
- pixelcmp_t satd; // Sum of Absolute Transformed Differences (4x4 Hadamaard)
+ pixelcmp_t sad; // Sum of Absolute Differences
+ pixelcmp_x3_t sad_x3; // Sum of Absolute Differences, 3 mv offsets at once
+ pixelcmp_x4_t sad_x4; // Sum of Absolute Differences, 4 mv offsets at once
+ pixelcmp_t satd; // Sum of Absolute Transformed Differences (4x4 Hadamard)
- filter_pp_t luma_hpp;
+ filter_pp_t luma_hpp; // 8-tap luma motion compensation interpolation filters
filter_hps_t luma_hps;
filter_pp_t luma_vpp;
filter_ps_t luma_vps;
filter_sp_t luma_vsp;
filter_ss_t luma_vss;
- filter_hv_pp_t luma_hvpp;
+ filter_hv_pp_t luma_hvpp; // combines hps + vsp
pixelavg_pp_t pixelavg_pp; // quick bidir using pixels (borrowed from x264)
addAvg_t addAvg; // bidir motion compensation, uses 16bit values
@@ -204,6 +210,12 @@ struct EncoderPrimitives
}
pu[NUM_PU_SIZES];
+ /* These primitives can be used for square TU blocks (4x4 to 32x32) or
+ * possibly square CU blocks (8x8 to 64x64). Some primitives are used for
+ * both CU and TU so we merge them into one array that is indexed uniformly.
+ * This keeps the index logic uniform and simple and improves cache
+ * coherency. CU only primitives will leave 4x4 pointers NULL while TU only
+ * primitives will leave 64x64 pointers NULL. Indexed by LumaCU */
struct CU
{
dct_t dct;
@@ -230,7 +242,7 @@ struct EncoderPrimitives
pixelcmp_t psy_cost_pp; // difference in AC energy between two pixel blocks
pixelcmp_ss_t psy_cost_ss; // difference in AC energy between two signed residual blocks
pixel_ssd_s_t ssd_s; // Sum of Square Error (residual coeff to self)
- pixelcmp_t sa8d; // Sum of 8x8 Hadamaard transformed differences
+ pixelcmp_t sa8d; // Sum of Transformed Differences (8x8 Hadamard), uses satd for 4x4 intra TU
transpose_t transpose; // transpose pixel block; for use with intra all-angs
intra_allangs_t intra_pred_allangs;
@@ -238,6 +250,9 @@ struct EncoderPrimitives
}
cu[NUM_CU_SIZES];
+ /* These remaining primitives work on either fixed block sizes or take
+ * block dimensions as arguments and thus do not belong in either the PU or
+ * the CU arrays */
dct_t dst4x4;
idct_t idst4x4;
@@ -273,11 +288,21 @@ struct EncoderPrimitives
filter_p2s_t luma_p2s;
+ /* There is one set of chroma primitives per color space. An encoder will
+ * have just a single color space and thus it will only ever use one entry
+ * in this array. However we always fill all entries in the array in case
+ * multiple encoders with different color spaces share the primitive table
+ * in a single process. Note that 4:2:0 PU and CU are 1/2 width and 1/2
+ * height of their luma counterparts. 4:2:2 PU and CU are 1/2 width and full
+ * height, while 4:4:4 directly uses the luma block sizes and shares luma
+ * primitives for all cases except for the interpolation filters. 4:4:4
+ * interpolation filters have luma partition sizes but are only 4-tap. */
struct Chroma
{
+ /* Chroma prediction unit primitives. Indexed by LumaPU */
struct PUChroma
{
- pixelcmp_t satd;
+ pixelcmp_t satd; // if chroma PU is not multiple of 4x4, will be NULL
filter_pp_t filter_vpp;
filter_ps_t filter_vps;
filter_sp_t filter_vsp;
@@ -289,9 +314,10 @@ struct EncoderPrimitives
}
pu[NUM_PU_SIZES];
+ /* Chroma transform and coding unit primitives. Indexed by LumaCU */
struct CUChroma
{
- pixelcmp_t sa8d;
+ pixelcmp_t sa8d; // if chroma CU is not multiple of 8x8, will use satd
pixelcmp_t sse_pp;
pixel_sub_ps_t sub_ps;
pixel_add_ps_t add_ps;
@@ -303,7 +329,7 @@ struct EncoderPrimitives
}
cu[NUM_CU_SIZES];
- filter_p2s_t p2s;
+ filter_p2s_t p2s; // takes width/height as arguments
}
chroma[X265_CSP_COUNT];
};
@@ -311,7 +337,7 @@ struct EncoderPrimitives
/* This copy of the table is what gets used by the encoder */
extern EncoderPrimitives primitives;
-/* Returns a LumaPartitions enum for the given size, always expected to return a valid enum */
+/* Returns a LumaPU enum for the given size, always expected to return a valid enum */
inline int partitionFromSizes(int width, int height)
{
X265_CHECK(((width | height) & ~(4 | 8 | 16 | 32 | 64)) == 0, "Invalid block width/height\n");
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/quant.cpp
--- a/source/common/quant.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/quant.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/quant.h
--- a/source/common/quant.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/quant.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/scalinglist.cpp
--- a/source/common/scalinglist.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/scalinglist.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/scalinglist.h
--- a/source/common/scalinglist.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/scalinglist.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/slice.cpp
--- a/source/common/slice.cpp Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/slice.cpp Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
/*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
*
* Authors: Steve Borho <steve at borho.org>
*
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/slice.h
--- a/source/common/slice.h Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/slice.h Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
More information about the x265-commits
mailing list