[x265-commits] [x265] primitives: better document the data structures and their...

Steve Borho steve at borho.org
Tue Jan 20 17:15:34 CET 2015


details:   http://hg.videolan.org/x265/rev/17ac389a6400
branches:  
changeset: 9177:17ac389a6400
user:      Steve Borho <steve at borho.org>
date:      Sun Jan 18 15:43:42 2015 +0530
description:
primitives: better document the data structures and their use
Subject: [x265] predict: disable conditional-expression-constant warnings

details:   http://hg.videolan.org/x265/rev/bbc333bd4a62
branches:  
changeset: 9178:bbc333bd4a62
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Mon Jan 19 09:59:33 2015 +0530
description:
predict: disable conditional-expression-constant warnings
Subject: [x265] x265: update copyright header

details:   http://hg.videolan.org/x265/rev/1ec53efeb07e
branches:  
changeset: 9179:1ec53efeb07e
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Mon Jan 19 15:26:35 2015 +0530
description:
x265: update copyright header
Subject: [x265] asm: psyCost_ss_16x16 in sse4: improve 31052c->9946c

details:   http://hg.videolan.org/x265/rev/2b2c656111ea
branches:  
changeset: 9180:2b2c656111ea
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Jan 19 10:56:24 2015 +0530
description:
asm: psyCost_ss_16x16 in sse4: improve 31052c->9946c
Subject: [x265] asm: psyCost_ss_32x32 in sse4: improve 136848c->39754c

details:   http://hg.videolan.org/x265/rev/5b38663a792a
branches:  
changeset: 9181:5b38663a792a
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Jan 19 11:05:33 2015 +0530
description:
asm: psyCost_ss_32x32 in sse4: improve 136848c->39754c
Subject: [x265] asm: psyCost_ss_64x64 in sse4: improve 501123c->159906c

details:   http://hg.videolan.org/x265/rev/c2048e0d9783
branches:  
changeset: 9182:c2048e0d9783
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Jan 19 11:16:31 2015 +0530
description:
asm: psyCost_ss_64x64 in sse4: improve 501123c->159906c
Subject: [x265] asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode

details:   http://hg.videolan.org/x265/rev/20381760757b
branches:  
changeset: 9183:20381760757b
user:      Min Chen <chenm003 at 163.com>
date:      Mon Jan 19 18:21:45 2015 +0800
description:
asm: rewrite and fix bug in weight_pp_sse4 on HIGH_BIT_DEPTH mode
Subject: [x265] asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode

details:   http://hg.videolan.org/x265/rev/4f8b7cc9d51e
branches:  
changeset: 9184:4f8b7cc9d51e
user:      Min Chen <chenm003 at 163.com>
date:      Mon Jan 19 18:21:50 2015 +0800
description:
asm: rewrite and fix bug in weight_sp_sse4 on HIGH_BIT_DEPTH mode
Subject: [x265] avoid warning on variant correction in weight_sp_c()

details:   http://hg.videolan.org/x265/rev/b49cb2d2c82f
branches:  
changeset: 9185:b49cb2d2c82f
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jan 20 01:19:23 2015 +0800
description:
avoid warning on variant correction in weight_sp_c()
Subject: [x265] asm: fix broken on weight_sp and weight_pp on 8bpp mode

details:   http://hg.videolan.org/x265/rev/e331bf2b402d
branches:  
changeset: 9186:e331bf2b402d
user:      Min Chen <chenm003 at 163.com>
date:      Tue Jan 20 01:33:51 2015 +0800
description:
asm: fix broken on weight_sp and weight_pp on 8bpp mode
Subject: [x265] asm: idct16 intrinsic 28900->25000 improvement over previous intrinsic

details:   http://hg.videolan.org/x265/rev/6b72bb520a91
branches:  
changeset: 9187:6b72bb520a91
user:      David T Yuen <dtyx265 at gmail.com>
date:      Mon Jan 19 09:43:36 2015 -0800
description:
asm: idct16 intrinsic 28900->25000 improvement over previous intrinsic
Subject: [x265] asm: remove obsolete comment

details:   http://hg.videolan.org/x265/rev/3bc00d8dfce6
branches:  
changeset: 9188:3bc00d8dfce6
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 20 09:28:56 2015 -0600
description:
asm: remove obsolete comment
Subject: [x265] pixelharness: cleanup

details:   http://hg.videolan.org/x265/rev/589eba98c46a
branches:  
changeset: 9189:589eba98c46a
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 20 09:35:06 2015 -0600
description:
pixelharness: cleanup
Subject: [x265] asm: cleanups

details:   http://hg.videolan.org/x265/rev/8d470bbcfc9f
branches:  
changeset: 9190:8d470bbcfc9f
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 20 09:54:30 2015 -0600
description:
asm: cleanups

diffstat:

 source/common/constants.cpp          |    2 +-
 source/common/constants.h            |    2 +-
 source/common/contexts.h             |    2 +-
 source/common/cudata.cpp             |    2 +-
 source/common/cudata.h               |    2 +-
 source/common/picyuv.cpp             |    2 +-
 source/common/picyuv.h               |    2 +-
 source/common/pixel.cpp              |    9 +
 source/common/predict.cpp            |    4 +
 source/common/primitives.h           |   56 ++-
 source/common/quant.cpp              |    2 +-
 source/common/quant.h                |    2 +-
 source/common/scalinglist.cpp        |    2 +-
 source/common/scalinglist.h          |    2 +-
 source/common/slice.cpp              |    2 +-
 source/common/slice.h                |    2 +-
 source/common/vec/dct-sse3.cpp       |  612 ++++++++++++++++---------------
 source/common/x86/asm-primitives.cpp |   27 +-
 source/common/x86/const-a.asm        |    1 +
 source/common/x86/pixel-a.asm        |  654 +++++++++++++++++++++++++++++++++++
 source/common/x86/pixel-util8.asm    |  171 ++++++++-
 source/common/x86/pixel.h            |    3 +
 source/common/yuv.cpp                |    2 +-
 source/common/yuv.h                  |    2 +-
 source/encoder/encoder.cpp           |    2 +-
 source/test/pixelharness.cpp         |  183 +++++----
 source/test/pixelharness.h           |    2 +-
 27 files changed, 1324 insertions(+), 430 deletions(-)

diffs (truncated from 2284 to 300 lines):

diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/constants.cpp
--- a/source/common/constants.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/constants.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
-* Copyright (C) 2014 x265 project
+* Copyright (C) 2015 x265 project
 *
 * Authors: Steve Borho <steve at borho.org>
 *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/constants.h
--- a/source/common/constants.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/constants.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/contexts.h
--- a/source/common/contexts.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/contexts.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
-* Copyright (C) 2014 x265 project
+* Copyright (C) 2015 x265 project
 *
 * Authors: Steve Borho <steve at borho.org>
 *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/cudata.cpp
--- a/source/common/cudata.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/cudata.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/cudata.h
--- a/source/common/cudata.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/cudata.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/picyuv.cpp
--- a/source/common/picyuv.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/picyuv.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/picyuv.h
--- a/source/common/picyuv.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/picyuv.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/pixel.cpp
--- a/source/common/pixel.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/pixel.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -520,6 +520,15 @@ void weight_sp_c(const int16_t* src, pix
 {
     int x, y;
 
+#if CHECKED_BUILD || _DEBUG
+    const int correction = (IF_INTERNAL_PREC - X265_DEPTH);
+#endif
+
+    X265_CHECK(!((w0 << 6) > 32767), "w0 using more than 16 bits, asm output will mismatch\n");
+    X265_CHECK(!(round > 32767), "round using more than 16 bits, asm output will mismatch\n");
+    X265_CHECK((shift >= correction), "shift must be include factor correction, please update ASM ABI\n");
+    X265_CHECK(!(round & ((1 << correction) - 1)), "round must be include factor correction, please update ASM ABI\n");
+
     for (y = 0; y <= height - 1; y++)
     {
         for (x = 0; x <= width - 1; )
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/predict.cpp
--- a/source/common/predict.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/predict.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -30,6 +30,10 @@
 
 using namespace x265;
 
+#if _MSC_VER
+#pragma warning(disable: 4127) // conditional expression is constant
+#endif
+
 namespace
 {
 inline pixel weightBidir(int w0, int16_t P0, int w1, int16_t P1, int round, int shift, int offset)
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/primitives.h
--- a/source/common/primitives.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/primitives.h	Tue Jan 20 09:54:30 2015 -0600
@@ -38,7 +38,7 @@ namespace x265 {
 
 enum LumaPU
 {
-    // Square (the first 5 PUs match the CU sizes)
+    // Square (the first 5 PUs match the block sizes)
     LUMA_4x4,   LUMA_8x8,   LUMA_16x16, LUMA_32x32, LUMA_64x64,
     // Rectangular
     LUMA_8x4,   LUMA_4x8,
@@ -65,9 +65,9 @@ enum LumaCU // can be indexed using log2
 enum { NUM_TR_SIZE = 4 }; // TU are 4x4, 8x8, 16x16, and 32x32
 
 
-// Chroma partition sizes. These enums are just a convenience for indexing into the
-// chroma primitive arrays when instantiating templates. The chroma function tables should
-// always be indexed by the luma PU enum
+/* Chroma partition sizes. These enums are only a convenience for indexing into
+ * the chroma primitive arrays when instantiating macros or templates. The
+ * chroma function tables should always be indexed by a LumaPU enum when used. */
 enum ChromaPU420
 {
     CHROMA_420_2x2,   CHROMA_420_4x4,   CHROMA_420_8x8,  CHROMA_420_16x16, CHROMA_420_32x32,
@@ -182,20 +182,26 @@ typedef void (*cutree_propagate_cost) (i
  * either an assembly routine, a SIMD intrinsic primitive, or a C function */
 struct EncoderPrimitives
 {
+    /* These primitives can be used for any sized prediction unit (from 4x4 to
+     * 64x64, square, rectangular - 50/50 or asymmetrical - 25/75) and are
+     * generally restricted to motion estimation and motion compensation (inter
+     * prediction. Note that the 4x4 PU can only be used for intra, which is
+     * really a 4x4 TU, so at most copy_pp and satd will use 4x4. This array is
+     * indexed by LumaPU values, which can be retrieved by partitionFromSizes() */
     struct PU
     {
-        pixelcmp_t     sad;        // Sum of Absolute Differences
-        pixelcmp_x3_t  sad_x3;     // Sum of Absolute Differences, 3 mv offsets at once
-        pixelcmp_x4_t  sad_x4;     // Sum of Absolute Differences, 4 mv offsets at once
-        pixelcmp_t     satd;       // Sum of Absolute Transformed Differences (4x4 Hadamaard)
+        pixelcmp_t     sad;         // Sum of Absolute Differences
+        pixelcmp_x3_t  sad_x3;      // Sum of Absolute Differences, 3 mv offsets at once
+        pixelcmp_x4_t  sad_x4;      // Sum of Absolute Differences, 4 mv offsets at once
+        pixelcmp_t     satd;        // Sum of Absolute Transformed Differences (4x4 Hadamard)
 
-        filter_pp_t    luma_hpp;
+        filter_pp_t    luma_hpp;    // 8-tap luma motion compensation interpolation filters
         filter_hps_t   luma_hps;
         filter_pp_t    luma_vpp;
         filter_ps_t    luma_vps;
         filter_sp_t    luma_vsp;
         filter_ss_t    luma_vss;
-        filter_hv_pp_t luma_hvpp;
+        filter_hv_pp_t luma_hvpp;   // combines hps + vsp
 
         pixelavg_pp_t  pixelavg_pp; // quick bidir using pixels (borrowed from x264)
         addAvg_t       addAvg;      // bidir motion compensation, uses 16bit values
@@ -204,6 +210,12 @@ struct EncoderPrimitives
     }
     pu[NUM_PU_SIZES];
 
+    /* These primitives can be used for square TU blocks (4x4 to 32x32) or
+     * possibly square CU blocks (8x8 to 64x64). Some primitives are used for
+     * both CU and TU so we merge them into one array that is indexed uniformly.
+     * This keeps the index logic uniform and simple and improves cache
+     * coherency. CU only primitives will leave 4x4 pointers NULL while TU only
+     * primitives will leave 64x64 pointers NULL.  Indexed by LumaCU */
     struct CU
     {
         dct_t           dct;
@@ -230,7 +242,7 @@ struct EncoderPrimitives
         pixelcmp_t      psy_cost_pp;   // difference in AC energy between two pixel blocks
         pixelcmp_ss_t   psy_cost_ss;   // difference in AC energy between two signed residual blocks
         pixel_ssd_s_t   ssd_s;         // Sum of Square Error (residual coeff to self)
-        pixelcmp_t      sa8d;          // Sum of 8x8 Hadamaard transformed differences
+        pixelcmp_t      sa8d;          // Sum of Transformed Differences (8x8 Hadamard), uses satd for 4x4 intra TU
 
         transpose_t     transpose;     // transpose pixel block; for use with intra all-angs
         intra_allangs_t intra_pred_allangs;
@@ -238,6 +250,9 @@ struct EncoderPrimitives
     }
     cu[NUM_CU_SIZES];
 
+    /* These remaining primitives work on either fixed block sizes or take
+     * block dimensions as arguments and thus do not belong in either the PU or
+     * the CU arrays */
     dct_t                 dst4x4;
     idct_t                idst4x4;
 
@@ -273,11 +288,21 @@ struct EncoderPrimitives
 
     filter_p2s_t          luma_p2s;
 
+    /* There is one set of chroma primitives per color space. An encoder will
+     * have just a single color space and thus it will only ever use one entry
+     * in this array. However we always fill all entries in the array in case
+     * multiple encoders with different color spaces share the primitive table
+     * in a single process. Note that 4:2:0 PU and CU are 1/2 width and 1/2
+     * height of their luma counterparts. 4:2:2 PU and CU are 1/2 width and full
+     * height, while 4:4:4 directly uses the luma block sizes and shares luma
+     * primitives for all cases except for the interpolation filters. 4:4:4
+     * interpolation filters have luma partition sizes but are only 4-tap. */
     struct Chroma
     {
+        /* Chroma prediction unit primitives. Indexed by LumaPU */
         struct PUChroma
         {
-            pixelcmp_t   satd;
+            pixelcmp_t   satd;      // if chroma PU is not multiple of 4x4, will be NULL
             filter_pp_t  filter_vpp;
             filter_ps_t  filter_vps;
             filter_sp_t  filter_vsp;
@@ -289,9 +314,10 @@ struct EncoderPrimitives
         }
         pu[NUM_PU_SIZES];
 
+        /* Chroma transform and coding unit primitives. Indexed by LumaCU */
         struct CUChroma
         {
-            pixelcmp_t     sa8d;
+            pixelcmp_t     sa8d;    // if chroma CU is not multiple of 8x8, will use satd
             pixelcmp_t     sse_pp;
             pixel_sub_ps_t sub_ps;
             pixel_add_ps_t add_ps;
@@ -303,7 +329,7 @@ struct EncoderPrimitives
         }
         cu[NUM_CU_SIZES];
 
-        filter_p2s_t p2s;
+        filter_p2s_t p2s; // takes width/height as arguments
     }
     chroma[X265_CSP_COUNT];
 };
@@ -311,7 +337,7 @@ struct EncoderPrimitives
 /* This copy of the table is what gets used by the encoder */
 extern EncoderPrimitives primitives;
 
-/* Returns a LumaPartitions enum for the given size, always expected to return a valid enum */
+/* Returns a LumaPU enum for the given size, always expected to return a valid enum */
 inline int partitionFromSizes(int width, int height)
 {
     X265_CHECK(((width | height) & ~(4 | 8 | 16 | 32 | 64)) == 0, "Invalid block width/height\n");
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/quant.cpp
--- a/source/common/quant.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/quant.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/quant.h
--- a/source/common/quant.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/quant.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/scalinglist.cpp
--- a/source/common/scalinglist.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/scalinglist.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/scalinglist.h
--- a/source/common/scalinglist.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/scalinglist.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/slice.cpp
--- a/source/common/slice.cpp	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/slice.cpp	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@
 /*****************************************************************************
- * Copyright (C) 2014 x265 project
+ * Copyright (C) 2015 x265 project
  *
  * Authors: Steve Borho <steve at borho.org>
  *
diff -r d8d13f2e2095 -r 8d470bbcfc9f source/common/slice.h
--- a/source/common/slice.h	Sat Jan 17 18:32:52 2015 +0900
+++ b/source/common/slice.h	Tue Jan 20 09:54:30 2015 -0600
@@ -1,5 +1,5 @@


More information about the x265-commits mailing list