[x265-commits] [x265] primitives: use NULL chroma satd func pointers for blocks...

Steve Borho steve at borho.org
Wed Dec 10 21:39:18 CET 2014


details:   http://hg.videolan.org/x265/rev/4c97d85c8488
branches:  
changeset: 8968:4c97d85c8488
user:      Steve Borho <steve at borho.org>
date:      Tue Dec 09 15:31:50 2014 -0600
description:
primitives: use NULL chroma satd func pointers for blocks not capable of satd

If the block is not a multiple of 4x4, then chroma satd measurements are not
possible, so we will disable chroma residual measurements for these block sizes
(and thus only measure luma residual)
Subject: [x265] motion: chroma ME [CHANGES OUTPUTS]

details:   http://hg.videolan.org/x265/rev/afd5620c77a4
branches:  
changeset: 8969:afd5620c77a4
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 18:53:28 2014 -0600
description:
motion: chroma ME [CHANGES OUTPUTS]

include chroma distortion in satd decisions when --subme > 2 and chroma blocks
are multiples of 4x4

This required making the MotionEstimate class more aware of PicYuv and its
indexing scheme so that it could find the correct chroma pixels to interpolate.
This allowed me to merge the setSourcePlane() method into the lookahead's
version of setSourcePU.

This requires further work. The Reference class needs to generate weighted
chroma planes if subpel refine will use chroma residual cost. Until this is
fixed, the chroma subpel steps will use unweighted reference pixels.
Subject: [x265] reference: weight chroma planes of reference pictures if using chroma satd

details:   http://hg.videolan.org/x265/rev/6c32c8d4e0a1
branches:  
changeset: 8970:6c32c8d4e0a1
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 10 01:20:51 2014 -0600
description:
reference: weight chroma planes of reference pictures if using chroma satd
Subject: [x265] asm: chroma_vpp[4x4] for colorspace i422 in avx2: improve 228c->184c

details:   http://hg.videolan.org/x265/rev/5f16dc82652a
branches:  
changeset: 8971:5f16dc82652a
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Dec 09 15:09:01 2014 +0530
description:
asm: chroma_vpp[4x4] for colorspace i422 in avx2: improve 228c->184c
Subject: [x265] api: add some blank lines

details:   http://hg.videolan.org/x265/rev/ab1e1e0ca75c
branches:  
changeset: 8972:ab1e1e0ca75c
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 10 13:40:04 2014 -0600
description:
api: add some blank lines
Subject: [x265] analysis: fix chroma predictions for 2Nx2N bidir at zero mv

details:   http://hg.videolan.org/x265/rev/0dc816f49c01
branches:  
changeset: 8973:0dc816f49c01
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 10 14:29:49 2014 -0600
description:
analysis: fix chroma predictions for 2Nx2N bidir at zero mv

Valgrind discovered that the chroma predictions were not in fact predicted
Subject: [x265] analysis: avoid redundant MC work

details:   http://hg.videolan.org/x265/rev/9e244ebe21d2
branches:  
changeset: 8974:9e244ebe21d2
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 10 14:38:52 2014 -0600
description:
analysis: avoid redundant MC work

diffstat:

 doc/reST/cli.rst                     |    9 +-
 source/common/lowres.cpp             |    2 +-
 source/common/lowres.h               |   22 ++-
 source/common/pixel.cpp              |   22 +-
 source/common/primitives.cpp         |    9 -
 source/common/x86/asm-primitives.cpp |    5 +
 source/common/x86/ipfilter8.h        |    1 +
 source/encoder/analysis.cpp          |   52 ++++++--
 source/encoder/frameencoder.cpp      |    5 +-
 source/encoder/motion.cpp            |  211 +++++++++++++++++++++++++---------
 source/encoder/motion.h              |   41 +++---
 source/encoder/reference.cpp         |  159 +++++++++++++++++---------
 source/encoder/reference.h           |   10 +-
 source/encoder/search.cpp            |   90 ++++++++++----
 source/encoder/slicetype.cpp         |   11 +-
 source/x265.h                        |    2 +
 16 files changed, 429 insertions(+), 222 deletions(-)

diffs (truncated from 1162 to 300 lines):

diff -r 29489f2fc2c7 -r 9e244ebe21d2 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Tue Dec 09 12:54:40 2014 -0600
+++ b/doc/reST/cli.rst	Wed Dec 10 14:38:52 2014 -0600
@@ -392,7 +392,7 @@ Mode decision / Analysis
 	+-------+---------------------------------------------------------------+
 	| 2     | RDO splits and merge/skip selection                           |
 	+-------+---------------------------------------------------------------+
-	| 3     | RDO mode and split decisions                                  |
+	| 3     | RDO mode and split decisions, chroma residual used for sa8d   |
 	+-------+---------------------------------------------------------------+
 	| 4     | Adds RDO Quant                                                |
 	+-------+---------------------------------------------------------------+
@@ -589,6 +589,13 @@ Temporal / motion search options
 	|  7 | 2          | 8         | 2          | 8         | true      |
 	+----+------------+-----------+------------+-----------+-----------+
 
+	At --subme values larger than 2, chroma residual cost is included
+	in all subpel refinement steps and chroma residual is included in
+	all motion estimation decisions (selecting the best reference
+	picture in each list, and chosing between merge, uni-directional
+	motion and bi-directional motion). The 'slow' preset is the first
+	preset to enable the use of chroma residual.
+
 .. option:: --merange <integer>
 
 	Motion search range. Default 57
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/lowres.cpp
--- a/source/common/lowres.cpp	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/lowres.cpp	Wed Dec 10 14:38:52 2014 -0600
@@ -166,5 +166,5 @@ void Lowres::init(PicYuv *origPic, int p
     extendPicBorder(lowresPlane[1], lumaStride, width, lines, origPic->m_lumaMarginX, origPic->m_lumaMarginY);
     extendPicBorder(lowresPlane[2], lumaStride, width, lines, origPic->m_lumaMarginX, origPic->m_lumaMarginY);
     extendPicBorder(lowresPlane[3], lumaStride, width, lines, origPic->m_lumaMarginX, origPic->m_lumaMarginY);
-    fpelPlane = lowresPlane[0];
+    fpelPlane[0] = lowresPlane[0];
 }
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/lowres.h
--- a/source/common/lowres.h	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/lowres.h	Wed Dec 10 14:38:52 2014 -0600
@@ -26,28 +26,36 @@
 
 #include "primitives.h"
 #include "common.h"
+#include "picyuv.h"
 #include "mv.h"
 
 namespace x265 {
 // private namespace
 
-class PicYuv;
-
 struct ReferencePlanes
 {
     ReferencePlanes() { memset(this, 0, sizeof(ReferencePlanes)); }
 
-    pixel*   fpelPlane;
+    pixel*   fpelPlane[3];
     pixel*   lowresPlane[4];
     PicYuv*  reconPic;
 
     bool     isWeighted;
     bool     isLowres;
+
     intptr_t lumaStride;
-    int      weight;
-    int      offset;
-    int      shift;
-    int      round;
+    intptr_t chromaStride;
+
+    struct {
+        int      weight;
+        int      offset;
+        int      shift;
+        int      round;
+    } w[3];
+
+    pixel* getLumaAddr(uint32_t ctuAddr, uint32_t absPartIdx) { return fpelPlane[0] + reconPic->m_cuOffsetY[ctuAddr] + reconPic->m_buOffsetY[absPartIdx]; }
+    pixel* getCbAddr(uint32_t ctuAddr, uint32_t absPartIdx)   { return fpelPlane[1] + reconPic->m_cuOffsetC[ctuAddr] + reconPic->m_buOffsetC[absPartIdx]; }
+    pixel* getCrAddr(uint32_t ctuAddr, uint32_t absPartIdx)   { return fpelPlane[2] + reconPic->m_cuOffsetC[ctuAddr] + reconPic->m_buOffsetC[absPartIdx]; }
 
     /* lowres motion compensation, you must provide a buffer and stride for QPEL averaged pixels
      * in case QPEL is required.  Else it returns a pointer to the HPEL pixels */
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/pixel.cpp	Wed Dec 10 14:38:52 2014 -0600
@@ -1085,14 +1085,14 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.satd[LUMA_64x16] = satd8<64, 16>;
     p.satd[LUMA_16x64] = satd8<16, 64>;
 
-    p.chroma[X265_CSP_I420].satd[CHROMA_2x2]   = sad<2, 2>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x2]   = NULL;
     p.chroma[X265_CSP_I420].satd[CHROMA_4x4]   = satd_4x4;
     p.chroma[X265_CSP_I420].satd[CHROMA_8x8]   = satd8<8, 8>;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x16] = satd8<16, 16>;
     p.chroma[X265_CSP_I420].satd[CHROMA_32x32] = satd8<32, 32>;
 
-    p.chroma[X265_CSP_I420].satd[CHROMA_4x2]   = sad<4, 2>;
-    p.chroma[X265_CSP_I420].satd[CHROMA_2x4]   = sad<2, 4>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x2]   = NULL;
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x4]   = NULL;
     p.chroma[X265_CSP_I420].satd[CHROMA_8x4]   = satd_8x4;
     p.chroma[X265_CSP_I420].satd[CHROMA_4x8]   = satd4<4, 8>;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x8]  = satd8<16, 8>;
@@ -1100,10 +1100,10 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.chroma[X265_CSP_I420].satd[CHROMA_32x16] = satd8<32, 16>;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x32] = satd8<16, 32>;
 
-    p.chroma[X265_CSP_I420].satd[CHROMA_8x6]   = sad<8, 6>;
-    p.chroma[X265_CSP_I420].satd[CHROMA_6x8]   = sad<6, 8>;
-    p.chroma[X265_CSP_I420].satd[CHROMA_8x2]   = sad<8, 2>;
-    p.chroma[X265_CSP_I420].satd[CHROMA_2x8]   = sad<2, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x6]   = NULL;
+    p.chroma[X265_CSP_I420].satd[CHROMA_6x8]   = NULL;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x2]   = NULL;
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x8]   = NULL;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x12] = satd4<16, 12>;
     p.chroma[X265_CSP_I420].satd[CHROMA_12x16] = satd4<12, 16>;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x4]  = satd4<16, 4>;
@@ -1113,14 +1113,14 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.chroma[X265_CSP_I420].satd[CHROMA_32x8]  = satd8<32, 8>;
     p.chroma[X265_CSP_I420].satd[CHROMA_8x32]  = satd8<8, 32>;
 
-    p.chroma[X265_CSP_I422].satd[CHROMA422_2x4]   = sad<2, 4>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x4]   = NULL;
     p.chroma[X265_CSP_I422].satd[CHROMA422_4x8]   = satd4<4, 8>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x16]  = satd8<8, 16>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x32] = satd8<16, 32>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_32x64] = satd8<32, 64>;
 
     p.chroma[X265_CSP_I422].satd[CHROMA422_4x4]   = satd_4x4;
-    p.chroma[X265_CSP_I422].satd[CHROMA422_2x8]   = sad<2, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x8]   = NULL;
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x8]   = satd8<8, 8>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_4x16]  = satd4<4, 16>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x16] = satd8<16, 16>;
@@ -1129,9 +1129,9 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x64] = satd8<16, 64>;
 
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x12]  = satd4<8, 12>;
-    p.chroma[X265_CSP_I422].satd[CHROMA422_6x16]  = sad<6, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_6x16]  = NULL;
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x4]   = satd4<8, 4>;
-    p.chroma[X265_CSP_I422].satd[CHROMA422_2x16]  = sad<2, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x16]  = NULL;
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x24] = satd8<16, 24>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_12x32] = satd4<12, 32>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x8]  = satd8<16, 8>;
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/primitives.cpp
--- a/source/common/primitives.cpp	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/primitives.cpp	Wed Dec 10 14:38:52 2014 -0600
@@ -106,8 +106,6 @@ void Setup_Alias_Primitives(EncoderPrimi
     p.chroma[X265_CSP_I420].satd[CHROMA_16x16] = primitives.satd[LUMA_16x16];
     p.chroma[X265_CSP_I420].satd[CHROMA_32x32] = primitives.satd[LUMA_32x32];
 
-    //p.chroma[X265_CSP_I420].satd[CHROMA_4x2] = sad<4, 2>;
-    //p.chroma[X265_CSP_I420].satd[CHROMA_2x4] = sad<2, 4>;
     p.chroma[X265_CSP_I420].satd[CHROMA_8x4]   = primitives.satd[LUMA_8x4];
     p.chroma[X265_CSP_I420].satd[CHROMA_4x8]   = primitives.satd[LUMA_4x8];
     p.chroma[X265_CSP_I420].satd[CHROMA_16x8]  = primitives.satd[LUMA_16x8];
@@ -115,10 +113,6 @@ void Setup_Alias_Primitives(EncoderPrimi
     p.chroma[X265_CSP_I420].satd[CHROMA_32x16] = primitives.satd[LUMA_32x16];
     p.chroma[X265_CSP_I420].satd[CHROMA_16x32] = primitives.satd[LUMA_16x32];
 
-    //p.chroma[X265_CSP_I420].satd[CHROMA_8x6] = sad<8, 6>;
-    //p.chroma[X265_CSP_I420].satd[CHROMA_6x8] = sad<6, 8>;
-    //p.chroma[X265_CSP_I420].satd[CHROMA_8x2] = sad<8, 2>;
-    //p.chroma[X265_CSP_I420].satd[CHROMA_2x8] = sad<2, 8>;
     p.chroma[X265_CSP_I420].satd[CHROMA_16x12] = primitives.satd[LUMA_16x12];
     p.chroma[X265_CSP_I420].satd[CHROMA_12x16] = primitives.satd[LUMA_12x16];
     p.chroma[X265_CSP_I420].satd[CHROMA_16x4]  = primitives.satd[LUMA_16x4];
@@ -134,7 +128,6 @@ void Setup_Alias_Primitives(EncoderPrimi
     p.chroma[X265_CSP_I422].satd[CHROMA422_32x64] = primitives.satd[LUMA_32x64];
 
     p.chroma[X265_CSP_I422].satd[CHROMA422_4x4]   = primitives.satd[LUMA_4x4];
-    //p.chroma[X265_CSP_I422].satd[CHROMA422_2x8] = sad<2, 8>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x8]   = primitives.satd[LUMA_8x8];
     p.chroma[X265_CSP_I422].satd[CHROMA422_4x16]  = primitives.satd[LUMA_4x16];
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x16] = primitives.satd[LUMA_16x16];
@@ -143,9 +136,7 @@ void Setup_Alias_Primitives(EncoderPrimi
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x64] = primitives.satd[LUMA_16x64];
 
     //p.chroma[X265_CSP_I422].satd[CHROMA422_8x12]  = satd4<8, 12>;
-    //p.chroma[X265_CSP_I422].satd[CHROMA422_6x16]  = sad<6, 16>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_8x4]   = primitives.satd[LUMA_8x4];
-    //p.chroma[X265_CSP_I422].satd[CHROMA422_2x16]  = sad<2, 16>;
     //p.chroma[X265_CSP_I422].satd[CHROMA422_16x24] = satd8<16, 24>;
     //p.chroma[X265_CSP_I422].satd[CHROMA422_12x32] = satd4<12, 32>;
     p.chroma[X265_CSP_I422].satd[CHROMA422_16x8]  = primitives.satd[LUMA_16x8];
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/x86/asm-primitives.cpp	Wed Dec 10 14:38:52 2014 -0600
@@ -1882,8 +1882,13 @@ void Setup_Assembly_Primitives(EncoderPr
         p.luma_vpp[LUMA_8x16] = x265_interp_8tap_vert_pp_8x16_avx2;
         p.luma_vpp[LUMA_8x32] = x265_interp_8tap_vert_pp_8x32_avx2;
 
+        // color space i420
         p.chroma[X265_CSP_I420].filter_vpp[CHROMA_4x4] = x265_interp_4tap_vert_pp_4x4_avx2;
         p.chroma[X265_CSP_I420].filter_vpp[CHROMA_8x8] = x265_interp_4tap_vert_pp_8x8_avx2;
+
+        // color space i422
+        p.chroma[X265_CSP_I422].filter_vpp[CHROMA422_4x4] = x265_interp_4tap_vert_pp_4x4_avx2;
+
 #if X86_64
         p.chroma[X265_CSP_I420].filter_vpp[CHROMA_16x16] = x265_interp_4tap_vert_pp_16x16_avx2;
 #endif
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/common/x86/ipfilter8.h
--- a/source/common/x86/ipfilter8.h	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/common/x86/ipfilter8.h	Wed Dec 10 14:38:52 2014 -0600
@@ -580,6 +580,7 @@ CHROMA_SS_FILTERS(_sse2);
 CHROMA_SS_FILTERS_SSE4(_sse4);
 
 CHROMA_FILTERS_422(_sse4);
+CHROMA_FILTERS_422(_avx2);
 CHROMA_SP_FILTERS_422(_sse2);
 CHROMA_SP_FILTERS_422_SSE4(_sse4);
 CHROMA_SS_FILTERS_422(_sse2);
diff -r 29489f2fc2c7 -r 9e244ebe21d2 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp	Tue Dec 09 12:54:40 2014 -0600
+++ b/source/encoder/analysis.cpp	Wed Dec 10 14:38:52 2014 -0600
@@ -61,9 +61,12 @@ using namespace x265;
  *
  *   RDO selection between merge and skip
  *   sa8d selection of best inter mode
+ *   sa8d decisions include chroma residual cost
  *   RDO selection between (merge/skip) / best inter mode / intra / split
  *
  * rd-level 4 enables RDOQuant
+ *   chroma residual cost included in satd decisions, including subpel refine
+ *    (as a result of --subme 3 being used by preset slow)
  *
  * rd-level 5,6 does RDO for each inter mode
  */
@@ -358,11 +361,7 @@ void Analysis::parallelME(int threadId, 
         slave->m_slice = m_slice;
         slave->m_frame = m_frame;
 
-        PicYuv* fencPic = m_frame->m_fencPic;
-        pixel* pu = fencPic->getLumaAddr(m_curInterMode->cu.m_cuAddr, m_curGeom->encodeIdx + m_puAbsPartIdx);
-        slave->m_me.setSourcePlane(fencPic->m_picOrg[0], fencPic->m_stride);
-        slave->m_me.setSourcePU(*m_curInterMode->fencYuv, m_puAbsPartIdx, pu - fencPic->m_picOrg[0], m_puWidth, m_puHeight);
-
+        slave->m_me.setSourcePU(*m_curInterMode->fencYuv, m_curInterMode->cu.m_cuAddr, m_curGeom->encodeIdx, m_puAbsPartIdx, m_puWidth, m_puHeight);
         slave->prepMotionCompensation(m_curInterMode->cu, *m_curGeom, m_curPart);
     }
 
@@ -385,8 +384,6 @@ void Analysis::parallelModeAnalysis(int 
         slave->m_frame = m_frame;
         slave->setQP(*m_slice, m_rdCost.m_qp);
         slave->invalidateContexts(0);
-        if (jobId)
-            slave->m_me.setSourcePlane(m_frame->m_fencPic->m_picOrg[0], m_frame->m_fencPic->m_stride);
     }
 
     ModeDepth& md = m_modeDepth[m_curGeom->depth];
@@ -1607,18 +1604,31 @@ void Analysis::checkBidir2Nx2N(Mode& int
     {
         /* Estimate cost of BIDIR using coincident blocks */
         Yuv& tmpPredYuv = m_rqt[cuGeom.depth].tmpPredYuv;
-        pixel *fref0 = m_slice->m_mref[0][ref0].getLumaAddr(cu.m_cuAddr, cuGeom.encodeIdx);
-        pixel *fref1 = m_slice->m_mref[1][ref1].getLumaAddr(cu.m_cuAddr, cuGeom.encodeIdx);
-        intptr_t refStride = m_slice->m_mref[0][0].lumaStride;
 
-        primitives.pixelavg_pp[partEnum](tmpPredYuv.m_buf[0], tmpPredYuv.m_size, fref0, refStride, fref1, refStride, 32);
-        int zsa8d = primitives.sa8d[partEnum](fencYuv.m_buf[0], fencYuv.m_size, tmpPredYuv.m_buf[0], tmpPredYuv.m_size);
+        int zsa8d;
+
         if (m_bChromaSa8d)
         {
-            /* Add in chroma distortion */
+            cu.m_mv[0][0] = mvzero;
+            cu.m_mv[1][0] = mvzero;
+
+            prepMotionCompensation(cu, cuGeom, 0);
+            motionCompensation(tmpPredYuv, true, true);
+
+            zsa8d  = primitives.sa8d[partEnum](fencYuv.m_buf[0], fencYuv.m_size, tmpPredYuv.m_buf[0], tmpPredYuv.m_size);
             zsa8d += primitives.sa8d_inter[cpart](fencYuv.m_buf[1], fencYuv.m_csize, tmpPredYuv.m_buf[1], tmpPredYuv.m_csize);
             zsa8d += primitives.sa8d_inter[cpart](fencYuv.m_buf[2], fencYuv.m_csize, tmpPredYuv.m_buf[2], tmpPredYuv.m_csize);
         }
+        else
+        {
+            pixel *fref0 = m_slice->m_mref[0][ref0].getLumaAddr(cu.m_cuAddr, cuGeom.encodeIdx);
+            pixel *fref1 = m_slice->m_mref[1][ref1].getLumaAddr(cu.m_cuAddr, cuGeom.encodeIdx);
+            intptr_t refStride = m_slice->m_mref[0][0].lumaStride;
+
+            primitives.pixelavg_pp[partEnum](tmpPredYuv.m_buf[0], tmpPredYuv.m_size, fref0, refStride, fref1, refStride, 32);
+            zsa8d = primitives.sa8d[partEnum](fencYuv.m_buf[0], fencYuv.m_size, tmpPredYuv.m_buf[0], tmpPredYuv.m_size);
+        }
+
         uint32_t bits0 = bestME[0].bits - m_me.bitcost(bestME[0].mv, mvp0) + m_me.bitcost(mvzero, mvp0);
         uint32_t bits1 = bestME[1].bits - m_me.bitcost(bestME[1].mv, mvp1) + m_me.bitcost(mvzero, mvp1);
         uint32_t zcost = zsa8d + m_rdCost.getCost(bits0) + m_rdCost.getCost(bits1);
@@ -1643,8 +1653,20 @@ void Analysis::checkBidir2Nx2N(Mode& int
             cu.m_mvd[1][0] = mvzero - mvp1;
             cu.m_mvpIdx[1][0] = (uint8_t)mvpIdx1;
 
-            prepMotionCompensation(cu, cuGeom, 0);
-            motionCompensation(bidir2Nx2N.predYuv, true, true);
+            if (m_bChromaSa8d)


More information about the x265-commits mailing list