[x265-commits] [x265] yuv: plumb in support for mono-chrome YUV buffers

Tue Dec 9 22:13:49 CET 2014

details:   http://hg.videolan.org/x265/rev/5a44d694ed9b
branches:  
changeset: 8958:5a44d694ed9b
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 11:47:08 2014 -0600
description:
yuv: plumb in support for mono-chrome YUV buffers

The need for this will be obvious in the next commit
Subject: [x265] slicetype: cleanups - use bufSATD method where applicable

details:   http://hg.videolan.org/x265/rev/b5b05c94ae7c
branches:  
changeset: 8959:b5b05c94ae7c
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 11:48:45 2014 -0600
description:
slicetype: cleanups - use bufSATD method where applicable
Subject: [x265] motion: use Yuv instance to hold fenc PU pixels (preparing for chroma ME)

details:   http://hg.videolan.org/x265/rev/e640c8461495
branches:  
changeset: 8960:e640c8461495
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 11:54:45 2014 -0600
description:
motion: use Yuv instance to hold fenc PU pixels (preparing for chroma ME)

This required making an init function which accepts the encoder color space. We
use 4:0:0 for lookahead since it does not keep chroma planes.  Note that I
explicitly renamed this Yuv instance fencPUYuv to make sure people understand it
is not a duplicate of the fencYuv kept by the Analysis structure; it will often
be a sub-partition of the CU fenc yuv.
Subject: [x265] motion: add a version of setSourcePU which can accept fenc from another Yuv

details:   http://hg.videolan.org/x265/rev/1d1f803a3eec
branches:  
changeset: 8961:1d1f803a3eec
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 12:46:09 2014 -0600
description:
motion: add a version of setSourcePU which can accept fenc from another Yuv

The analysis code has already gone through the trouble of loading the CU's fenc
pixels from the source picture into a much smaller Yuv buffer with small
strides. This allows us to avoid accessing the fenc PicYuv in a performance
critical portion of the encoder.

We utilize the Yuv class to copy the PU, since it already has logic for
calculating part offsets for luma and chroma
Subject: [x265] search: rename index variable to puIdx for consistency

details:   http://hg.videolan.org/x265/rev/1cab6a4c0ab8
branches:  
changeset: 8962:1cab6a4c0ab8
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 18:49:29 2014 -0600
description:
search: rename index variable to puIdx for consistency
Subject: [x265] yuv: fix size check in copyFromYuv

details:   http://hg.videolan.org/x265/rev/15be837edb36
branches:  
changeset: 8963:15be837edb36
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 12:47:37 2014 -0600
description:
yuv: fix size check in copyFromYuv

The target buffer needs to be as large as or larger than the source. The fact
that this check has never failed tells me all users of this function have equal
sized arguments.
Subject: [x265] motion: sync argument names between the header and the cpp file

details:   http://hg.videolan.org/x265/rev/e2b958539e6a
branches:  
changeset: 8964:e2b958539e6a
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 12:48:00 2014 -0600
description:
motion: sync argument names between the header and the cpp file
Subject: [x265] reference: move reconPic pointer to base class so it is available to ME

details:   http://hg.videolan.org/x265/rev/dd55fd39745c
branches:  
changeset: 8965:dd55fd39745c
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 08 18:48:42 2014 -0600
description:
reference: move reconPic pointer to base class so it is available to ME
Subject: [x265] primitives: add a chroma satd table that is indexed by luma partition

details:   http://hg.videolan.org/x265/rev/47c490836fd8
branches:  
changeset: 8966:47c490836fd8
user:      Steve Borho <steve at borho.org>
date:      Tue Dec 09 11:47:46 2014 -0600
description:
primitives: add a chroma satd table that is indexed by luma partition

There are a number of chroma partitions that have dimensions of 2 or 6 and those
cannot use satd (which is 4x4 based), so we degrade them down to SAD which makes
me unhappy.
Subject: [x265] primitives: use luma satd functions for chroma, where applicable

details:   http://hg.videolan.org/x265/rev/29489f2fc2c7
branches:  
changeset: 8967:29489f2fc2c7
user:      Steve Borho <steve at borho.org>
date:      Tue Dec 09 12:54:40 2014 -0600
description:
primitives: use luma satd functions for chroma, where applicable

The commented lines should be considered TODO items for the assembly team

diffstat:

 source/common/lowres.h       |   1 +
 source/common/pixel.cpp      |  61 +++++++++++++++++++++++++++++++++++++++++--
 source/common/primitives.cpp |  58 +++++++++++++++++++++++++++++++++++++++++-
 source/common/primitives.h   |   1 +
 source/common/yuv.cpp        |  49 +++++++++++++++++++++++++++-------
 source/common/yuv.h          |   3 ++
 source/encoder/analysis.cpp  |   2 +-
 source/encoder/motion.cpp    |  61 ++++++++++++++++++++++++++++---------------
 source/encoder/motion.h      |  27 ++++++++++--------
 source/encoder/reference.cpp |  14 +++++-----
 source/encoder/reference.h   |   5 +--
 source/encoder/search.cpp    |  11 +++----
 source/encoder/slicetype.cpp |  14 ++++-----
 source/encoder/slicetype.h   |   3 +-
 14 files changed, 235 insertions(+), 75 deletions(-)

diffs (truncated from 679 to 300 lines):

diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/lowres.h

--- a/source/common/lowres.h	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/lowres.h	Tue Dec 09 12:54:40 2014 -0600
@@ -39,6 +39,7 @@ struct ReferencePlanes
 
     pixel*   fpelPlane;
     pixel*   lowresPlane[4];
+    PicYuv*  reconPic;
 
     bool     isWeighted;
     bool     isLowres;
diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/pixel.cpp	Tue Dec 09 12:54:40 2014 -0600
@@ -1085,6 +1085,62 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.satd[LUMA_64x16] = satd8<64, 16>;
     p.satd[LUMA_16x64] = satd8<16, 64>;
 
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x2]   = sad<2, 2>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x4]   = satd_4x4;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x8]   = satd8<8, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x16] = satd8<16, 16>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x32] = satd8<32, 32>;
+
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x2]   = sad<4, 2>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x4]   = sad<2, 4>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x4]   = satd_8x4;
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x8]   = satd4<4, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x8]  = satd8<16, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x16]  = satd8<8, 16>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x16] = satd8<32, 16>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x32] = satd8<16, 32>;
+
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x6]   = sad<8, 6>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_6x8]   = sad<6, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x2]   = sad<8, 2>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_2x8]   = sad<2, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x12] = satd4<16, 12>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_12x16] = satd4<12, 16>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x4]  = satd4<16, 4>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x16]  = satd4<4, 16>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x24] = satd8<32, 24>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_24x32] = satd8<24, 32>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x8]  = satd8<32, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x32]  = satd8<8, 32>;
+
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x4]   = sad<2, 4>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x8]   = satd4<4, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x16]  = satd8<8, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x32] = satd8<16, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x64] = satd8<32, 64>;
+
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x4]   = satd_4x4;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x8]   = sad<2, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x8]   = satd8<8, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x16]  = satd4<4, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x16] = satd8<16, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x32]  = satd8<8, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x32] = satd8<32, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x64] = satd8<16, 64>;
+
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x12]  = satd4<8, 12>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_6x16]  = sad<6, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x4]   = satd4<8, 4>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_2x16]  = sad<2, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x24] = satd8<16, 24>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_12x32] = satd4<12, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x8]  = satd8<16, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x32]  = satd4<4, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x48] = satd8<32, 48>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_24x64] = satd8<24, 64>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x16] = satd8<32, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x64]  = satd8<8, 64>;
+
 #define CHROMA_420(W, H) \
     p.chroma[X265_CSP_I420].addAvg[CHROMA_ ## W ## x ## H]  = addAvg<W, H>;         \
     p.chroma[X265_CSP_I420].copy_pp[CHROMA_ ## W ## x ## H] = blockcopy_pp_c<W, H>; \
@@ -1093,13 +1149,14 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.chroma[X265_CSP_I420].copy_ss[CHROMA_ ## W ## x ## H] = blockcopy_ss_c<W, H>;
 
 #define CHROMA_422(W, H) \
-    p.chroma[X265_CSP_I422].addAvg[CHROMA422_ ## W ## x ## H] = addAvg<W, H>;         \
+    p.chroma[X265_CSP_I422].addAvg[CHROMA422_ ## W ## x ## H]  = addAvg<W, H>;         \
     p.chroma[X265_CSP_I422].copy_pp[CHROMA422_ ## W ## x ## H] = blockcopy_pp_c<W, H>; \
     p.chroma[X265_CSP_I422].copy_sp[CHROMA422_ ## W ## x ## H] = blockcopy_sp_c<W, H>; \
     p.chroma[X265_CSP_I422].copy_ps[CHROMA422_ ## W ## x ## H] = blockcopy_ps_c<W, H>; \
     p.chroma[X265_CSP_I422].copy_ss[CHROMA422_ ## W ## x ## H] = blockcopy_ss_c<W, H>;
 
 #define CHROMA_444(W, H) \
+    p.chroma[X265_CSP_I444].satd[LUMA_ ## W ## x ## H]    = p.satd[LUMA_ ## W ## x ## H]; \
     p.chroma[X265_CSP_I444].addAvg[LUMA_ ## W ## x ## H]  = addAvg<W, H>; \
     p.chroma[X265_CSP_I444].copy_pp[LUMA_ ## W ## x ## H] = blockcopy_pp_c<W, H>; \
     p.chroma[X265_CSP_I444].copy_sp[LUMA_ ## W ## x ## H] = blockcopy_sp_c<W, H>; \
@@ -1129,8 +1186,6 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.chroma[X265_CSP_I444].sub_ps[LUMA_ ## W ## x ## H] = pixel_sub_ps_c<W, H>; \
     p.chroma[X265_CSP_I444].add_ps[LUMA_ ## W ## x ## H] = pixel_add_ps_c<W, H>;
 
-
-
     LUMA(4, 4);
     LUMA(8, 8);
     CHROMA_420(4, 4);
diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/primitives.cpp
--- a/source/common/primitives.cpp	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/primitives.cpp	Tue Dec 09 12:54:40 2014 -0600
@@ -75,7 +75,8 @@ void Setup_Alias_Primitives(EncoderPrimi
         p.chroma[X265_CSP_I444].copy_ps[i] = p.luma_copy_ps[i];
         p.chroma[X265_CSP_I444].copy_sp[i] = p.luma_copy_sp[i];
         p.chroma[X265_CSP_I444].copy_ss[i] = p.luma_copy_ss[i];
-        p.chroma[X265_CSP_I444].addAvg[i]  = p.luma_addAvg[i];
+        p.chroma[X265_CSP_I444].addAvg[i] = p.luma_addAvg[i];
+        p.chroma[X265_CSP_I444].satd[i] = p.satd[i];
     }
 
     for (int i = 0; i < NUM_SQUARE_BLOCKS; i++)
@@ -98,6 +99,61 @@ void Setup_Alias_Primitives(EncoderPrimi
     primitives.sa8d_inter[LUMA_16x4]  = primitives.satd[LUMA_16x4];
     primitives.sa8d_inter[LUMA_16x12] = primitives.satd[LUMA_16x12];
     primitives.sa8d_inter[LUMA_12x16] = primitives.satd[LUMA_12x16];
+
+    // Chroma SATD can often reuse luma primitives
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x4]   = primitives.satd[LUMA_4x4];
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x8]   = primitives.satd[LUMA_8x8];
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x16] = primitives.satd[LUMA_16x16];
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x32] = primitives.satd[LUMA_32x32];
+
+    //p.chroma[X265_CSP_I420].satd[CHROMA_4x2] = sad<4, 2>;
+    //p.chroma[X265_CSP_I420].satd[CHROMA_2x4] = sad<2, 4>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x4]   = primitives.satd[LUMA_8x4];
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x8]   = primitives.satd[LUMA_4x8];
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x8]  = primitives.satd[LUMA_16x8];
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x16]  = primitives.satd[LUMA_8x16];
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x16] = primitives.satd[LUMA_32x16];
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x32] = primitives.satd[LUMA_16x32];
+
+    //p.chroma[X265_CSP_I420].satd[CHROMA_8x6] = sad<8, 6>;
+    //p.chroma[X265_CSP_I420].satd[CHROMA_6x8] = sad<6, 8>;
+    //p.chroma[X265_CSP_I420].satd[CHROMA_8x2] = sad<8, 2>;
+    //p.chroma[X265_CSP_I420].satd[CHROMA_2x8] = sad<2, 8>;
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x12] = primitives.satd[LUMA_16x12];
+    p.chroma[X265_CSP_I420].satd[CHROMA_12x16] = primitives.satd[LUMA_12x16];
+    p.chroma[X265_CSP_I420].satd[CHROMA_16x4]  = primitives.satd[LUMA_16x4];
+    p.chroma[X265_CSP_I420].satd[CHROMA_4x16]  = primitives.satd[LUMA_4x16];
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x24] = primitives.satd[LUMA_32x24];
+    p.chroma[X265_CSP_I420].satd[CHROMA_24x32] = primitives.satd[LUMA_24x32];
+    p.chroma[X265_CSP_I420].satd[CHROMA_32x8]  = primitives.satd[LUMA_32x8];
+    p.chroma[X265_CSP_I420].satd[CHROMA_8x32]  = primitives.satd[LUMA_8x32];
+
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x8]   = primitives.satd[LUMA_4x8];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x16]  = primitives.satd[LUMA_8x16];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x32] = primitives.satd[LUMA_16x32];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x64] = primitives.satd[LUMA_32x64];
+
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x4]   = primitives.satd[LUMA_4x4];
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_2x8] = sad<2, 8>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x8]   = primitives.satd[LUMA_8x8];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_4x16]  = primitives.satd[LUMA_4x16];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x16] = primitives.satd[LUMA_16x16];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x32]  = primitives.satd[LUMA_8x32];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x32] = primitives.satd[LUMA_32x32];
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x64] = primitives.satd[LUMA_16x64];
+
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_8x12]  = satd4<8, 12>;
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_6x16]  = sad<6, 16>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_8x4]   = primitives.satd[LUMA_8x4];
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_2x16]  = sad<2, 16>;
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_16x24] = satd8<16, 24>;
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_12x32] = satd4<12, 32>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_16x8]  = primitives.satd[LUMA_16x8];
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_4x32]  = satd4<4, 32>;
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_32x48] = satd8<32, 48>;
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_24x64] = satd8<24, 64>;
+    p.chroma[X265_CSP_I422].satd[CHROMA422_32x16] = primitives.satd[LUMA_32x16];
+    //p.chroma[X265_CSP_I422].satd[CHROMA422_8x64]  = satd8<8, 64>;
 }
 }
 using namespace x265;
diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/primitives.h
--- a/source/common/primitives.h	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/primitives.h	Tue Dec 09 12:54:40 2014 -0600
@@ -272,6 +272,7 @@ struct EncoderPrimitives
 
     struct
     {
+        pixelcmp_t      satd[NUM_LUMA_PARTITIONS];
         filter_pp_t     filter_vpp[NUM_LUMA_PARTITIONS];
         filter_ps_t     filter_vps[NUM_LUMA_PARTITIONS];
         filter_sp_t     filter_vsp[NUM_LUMA_PARTITIONS];
diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/yuv.cpp
--- a/source/common/yuv.cpp	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/yuv.cpp	Tue Dec 09 12:54:40 2014 -0600
@@ -43,21 +43,31 @@ bool Yuv::create(uint32_t size, int csp)
     m_hChromaShift = CHROMA_H_SHIFT(csp);
     m_vChromaShift = CHROMA_V_SHIFT(csp);
 
-    // set width and height
     m_size  = size;
-    m_csize = size >> m_hChromaShift;
     m_part = partitionFromSizes(size, size);
 
-    size_t sizeL = size * size;
-    size_t sizeC = sizeL >> (m_vChromaShift + m_hChromaShift);
+    if (csp == X265_CSP_I400)
+    {
+        CHECKED_MALLOC(m_buf[0], pixel, size * size + 8);
+        m_buf[1] = m_buf[2] = 0;
+        m_csize = MAX_INT;
+        return true;
+    }
+    else
+    {
+        m_csize = size >> m_hChromaShift;
 
-    X265_CHECK((sizeC & 15) == 0, "invalid size");
+        size_t sizeL = size * size;
+        size_t sizeC = sizeL >> (m_vChromaShift + m_hChromaShift);
 
-    // memory allocation (padded for SIMD reads)
-    CHECKED_MALLOC(m_buf[0], pixel, sizeL + sizeC * 2 + 8);
-    m_buf[1] = m_buf[0] + sizeL;
-    m_buf[2] = m_buf[0] + sizeL + sizeC;
-    return true;
+        X265_CHECK((sizeC & 15) == 0, "invalid size");
+
+        // memory allocation (padded for SIMD reads)
+        CHECKED_MALLOC(m_buf[0], pixel, sizeL + sizeC * 2 + 8);
+        m_buf[1] = m_buf[0] + sizeL;
+        m_buf[2] = m_buf[0] + sizeL + sizeC;
+        return true;
+    }
 
 fail:
     return false;
@@ -92,13 +102,30 @@ void Yuv::copyFromPicYuv(const PicYuv& s
 
 void Yuv::copyFromYuv(const Yuv& srcYuv)
 {
-    X265_CHECK(m_size <= srcYuv.m_size, "invalid size\n");
+    X265_CHECK(m_size >= srcYuv.m_size, "invalid size\n");
 
     primitives.luma_copy_pp[m_part](m_buf[0], m_size, srcYuv.m_buf[0], srcYuv.m_size);
     primitives.chroma[m_csp].copy_pp[m_part](m_buf[1], m_csize, srcYuv.m_buf[1], srcYuv.m_csize);
     primitives.chroma[m_csp].copy_pp[m_part](m_buf[2], m_csize, srcYuv.m_buf[2], srcYuv.m_csize);
 }
 
+/* This version is intended for use by ME, which required FENC_STRIDE for luma fenc pixels */
+void Yuv::copyPUFromYuv(const Yuv& srcYuv, uint32_t absPartIdx, int partEnum, bool bChroma)
+{
+    X265_CHECK(m_size == FENC_STRIDE && m_size >= srcYuv.m_size, "PU buffer size mismatch\n");
+
+    const pixel* srcY = srcYuv.m_buf[0] + getAddrOffset(absPartIdx, srcYuv.m_size);
+    primitives.luma_copy_pp[partEnum](m_buf[0], m_size, srcY, srcYuv.m_size);
+
+    if (bChroma)
+    {
+        const pixel* srcU = srcYuv.m_buf[1] + srcYuv.getChromaAddrOffset(absPartIdx);
+        const pixel* srcV = srcYuv.m_buf[2] + srcYuv.getChromaAddrOffset(absPartIdx);
+        primitives.chroma[m_csp].copy_pp[partEnum](m_buf[1], m_csize, srcU, srcYuv.m_csize);
+        primitives.chroma[m_csp].copy_pp[partEnum](m_buf[2], m_csize, srcV, srcYuv.m_csize);
+    }
+}
+
 void Yuv::copyToPartYuv(Yuv& dstYuv, uint32_t absPartIdx) const
 {
     pixel* dstY = dstYuv.getLumaAddr(absPartIdx);
diff -r 88498ec9b10b -r 29489f2fc2c7 source/common/yuv.h
--- a/source/common/yuv.h	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/common/yuv.h	Tue Dec 09 12:54:40 2014 -0600
@@ -63,6 +63,9 @@ public:
     // Copy from same size YUV buffer
     void   copyFromYuv(const Yuv& srcYuv);
 
+    // Copy portion of srcYuv into ME prediction buffer
+    void   copyPUFromYuv(const Yuv& srcYuv, uint32_t absPartIdx, int partEnum, bool bChroma);
+
     // Copy Small YUV buffer to the part of other Big YUV buffer
     void   copyToPartYuv(Yuv& dstYuv, uint32_t absPartIdx) const;
 
diff -r 88498ec9b10b -r 29489f2fc2c7 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/encoder/analysis.cpp	Tue Dec 09 12:54:40 2014 -0600
@@ -361,7 +361,7 @@ void Analysis::parallelME(int threadId, 
         PicYuv* fencPic = m_frame->m_fencPic;
         pixel* pu = fencPic->getLumaAddr(m_curInterMode->cu.m_cuAddr, m_curGeom->encodeIdx + m_puAbsPartIdx);
         slave->m_me.setSourcePlane(fencPic->m_picOrg[0], fencPic->m_stride);
-        slave->m_me.setSourcePU(pu - fencPic->m_picOrg[0], m_puWidth, m_puHeight);
+        slave->m_me.setSourcePU(*m_curInterMode->fencYuv, m_puAbsPartIdx, pu - fencPic->m_picOrg[0], m_puWidth, m_puHeight);
 
         slave->prepMotionCompensation(m_curInterMode->cu, *m_curGeom, m_curPart);
     }
diff -r 88498ec9b10b -r 29489f2fc2c7 source/encoder/motion.cpp
--- a/source/encoder/motion.cpp	Tue Dec 09 10:03:35 2014 +0530
+++ b/source/encoder/motion.cpp	Tue Dec 09 12:54:40 2014 -0600
@@ -34,6 +34,7 @@
 using namespace x265;
 
 namespace {
+
 struct SubpelWorkload
 {