[x265-commits] [x265] rc: separate frame bits predictor objects for BRef and B ...

Thu Apr 9 18:57:32 CEST 2015

details:   http://hg.videolan.org/x265/rev/1b15d6129041
branches:  stable
changeset: 10131:1b15d6129041
user:      Aarthi Thirumalai
date:      Tue Mar 31 22:16:21 2015 +0530
description:
rc: separate frame bits predictor objects for BRef and B frames

improves frame size prediction for BRef frames in VBV.
Subject: [x265] rc: tune initial predictor values for better frame size predictions in vbv lookahead

details:   http://hg.videolan.org/x265/rev/751f9cf1dfc9
branches:  stable
changeset: 10132:751f9cf1dfc9
user:      Aarthi Thirumalai
date:      Mon Mar 16 12:05:38 2015 +0530
description:
rc: tune initial predictor values for better frame size predictions in vbv lookahead

an overall improvement in ssim of around .05-.1 db can be seen.
Also, improves visual quality at the start of the encode.

SteamLocomotiveTrain_2560x1600_60_10bit_crop.yuv --bitrate 9000 --vbv-bufsize 9000 --strict-cbr

       Bitrate	 Y PSNR	 U PSNR	 V PSNR	 Global PSNR	 SSIM	 SSIM (dB)
before  9168.43 39.689  45.096 45.152	  41.048	 0.93176  11.66
after   9106.16  39.864 45.169	45.199	  41.194	 0.93518  11.883
Subject: [x265] cmake: do not allow full path of libnuma to be used in x265.pc

details:   http://hg.videolan.org/x265/rev/664f16817191
branches:  
changeset: 10133:664f16817191
user:      Steve Borho <steve at borho.org>
date:      Thu Apr 09 10:51:29 2015 -0500
description:
cmake: do not allow full path of libnuma to be used in x265.pc

replaces /usr/local/lib/libnuma.so with -lnuma in x265.pc, which
fixes link issues for apps which use pkg-config
Subject: [x265] optimize c1c2 context set update logic in rdoQuant

details:   http://hg.videolan.org/x265/rev/f5070434149e
branches:  
changeset: 10134:f5070434149e
user:      Min Chen <chenm003 at 163.com>
date:      Thu Apr 09 20:26:24 2015 +0800
description:
optimize c1c2 context set update logic in rdoQuant
Subject: [x265] asm: sse4 8bpp code for chroma_p2s[2xN]

details:   http://hg.videolan.org/x265/rev/762ccae69668
branches:  
changeset: 10135:762ccae69668
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Apr 09 19:07:19 2015 +0530
description:
asm: sse4 8bpp code for chroma_p2s[2xN]

     chroma_p2s[2x4][i420](1.54x), chroma_p2s[2x8][i420](1.68x),
     chroma_p2s[2x8][i422](1.64x), chroma_p2s[2x16][i422](1.73x)
Subject: [x265] asm: intra_pred_ang16_12 improved by ~20% over SSE4

details:   http://hg.videolan.org/x265/rev/3fab71c31c7a
branches:  
changeset: 10136:3fab71c31c7a
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 10:58:27 2015 +0530
description:
asm: intra_pred_ang16_12 improved by ~20% over SSE4

AVX2:
intra_ang_16x16[12]     15.16x   777.51          11785.44

SSE4:
intra_ang_16x16[12]     11.51x   976.41          11238.16
Subject: [x265] asm: intra_pred_ang16_13 improved by ~9% over SSE4

details:   http://hg.videolan.org/x265/rev/fc46324bfd33
branches:  
changeset: 10137:fc46324bfd33
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 12:21:49 2015 +0530
description:
asm: intra_pred_ang16_13 improved by ~9% over SSE4

AVX2:
intra_ang_16x16[13]     12.56x   944.13          11862.80

SSE4:
intra_ang_16x16[13]     11.18x   1035.83         11579.52
Subject: [x265] asm: correct register count

details:   http://hg.videolan.org/x265/rev/e9a80d1db21d
branches:  
changeset: 10138:e9a80d1db21d
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 13:28:56 2015 +0530
description:
asm: correct register count
Subject: [x265] asm: intra_pred_ang8_13 improved by ~16% over SSE4

details:   http://hg.videolan.org/x265/rev/1e250a75e0d6
branches:  
changeset: 10139:1e250a75e0d6
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 15:53:47 2015 +0530
description:
asm: intra_pred_ang8_13 improved by ~16% over SSE4

AVX2:
intra_ang_8x8[13]       10.68x   297.95          3183.33

SSE4:
intra_ang_8x8[13]       9.16x    352.32          3225.62
Subject: [x265] asm: intra_pred_ang8_14 improved by ~15% over SSE4

details:   http://hg.videolan.org/x265/rev/0402ebc1ff19
branches:  
changeset: 10140:0402ebc1ff19
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 16:30:54 2015 +0530
description:
asm: intra_pred_ang8_14 improved by ~15% over SSE4

AVX2:
intra_ang_8x8[14]       10.02x   325.31          3260.80

SSE4:
intra_ang_8x8[14]       8.49x    379.47          3220.08
Subject: [x265] asm: intra_pred_ang8_15 improved by ~5% over SSE4

details:   http://hg.videolan.org/x265/rev/f45fa1c77ebd
branches:  
changeset: 10141:f45fa1c77ebd
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 17:08:14 2015 +0530
description:
asm: intra_pred_ang8_15 improved by ~5% over SSE4

AVX2:
intra_ang_8x8[15]       9.57x    342.52          3279.56

SSE4:
intra_ang_8x8[15]       8.95x    360.01          3223.45
Subject: [x265] asm: intra_pred_ang8_23 improved by ~18% over SSE4

details:   http://hg.videolan.org/x265/rev/c1e5d55a82ae
branches:  
changeset: 10142:c1e5d55a82ae
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 17:45:26 2015 +0530
description:
asm: intra_pred_ang8_23 improved by ~18% over SSE4

AVX2:
intra_ang_8x8[23]       9.75x    205.43          2002.05

SSE4:
intra_ang_8x8[23]       8.12x    251.42          2041.61
Subject: [x265] asm: intra_pred_ang8_22 improved by ~14% over SSE4

details:   http://hg.videolan.org/x265/rev/c7fdc791bc10
branches:  
changeset: 10143:c7fdc791bc10
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 18:07:22 2015 +0530
description:
asm: intra_pred_ang8_22 improved by ~14% over SSE4

AVX2:
intra_ang_8x8[22]       9.07x    221.82          2010.92

SSE4:
intra_ang_8x8[22]       7.77x    257.91          2002.77
Subject: [x265] asm: intra_pred_ang8_21 improved by ~5% over SSE4

details:   http://hg.videolan.org/x265/rev/358ac3cf2761
branches:  
changeset: 10144:358ac3cf2761
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Apr 09 18:18:33 2015 +0530
description:
asm: intra_pred_ang8_21 improved by ~5% over SSE4

AVX2:
intra_ang_8x8[21]       8.55x    239.75          2050.08

SSE4:
intra_ang_8x8[21]       8.03x    252.60          2027.91
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/984e254f93f7
branches:  
changeset: 10145:984e254f93f7
user:      Steve Borho <steve at borho.org>
date:      Thu Apr 09 11:48:08 2015 -0500
description:
Merge with stable

diffstat:

 doc/reST/cli.rst                     |    41 +-
 doc/reST/threading.rst               |    14 +
 readme.rst                           |     2 +-
 source/CMakeLists.txt                |     9 +-
 source/common/common.cpp             |    13 +-
 source/common/common.h               |     3 +-
 source/common/ipfilter.cpp           |    36 +-
 source/common/loopfilter.cpp         |    41 +-
 source/common/param.cpp              |     8 +-
 source/common/picyuv.cpp             |    10 +-
 source/common/pixel.cpp              |     2 +-
 source/common/predict.cpp            |    31 +-
 source/common/primitives.cpp         |     3 +-
 source/common/primitives.h           |    20 +-
 source/common/quant.cpp              |   168 +-
 source/common/quant.h                |    33 +-
 source/common/slice.h                |     1 +
 source/common/x86/asm-primitives.cpp |   240 +-
 source/common/x86/const-a.asm        |   154 +-
 source/common/x86/dct8.asm           |   145 +-
 source/common/x86/dct8.h             |     2 +
 source/common/x86/intrapred.h        |    63 +
 source/common/x86/intrapred16.asm    |   502 ++++
 source/common/x86/intrapred8.asm     |  4235 +++++++++++++++++++++++++++++++++-
 source/common/x86/ipfilter8.asm      |  1761 +++++++++++--
 source/common/x86/ipfilter8.h        |    93 +-
 source/common/x86/loopfilter.asm     |   410 ++-
 source/common/x86/loopfilter.h       |     6 +-
 source/common/x86/pixel-util.h       |     5 +-
 source/common/x86/pixel-util8.asm    |   209 +-
 source/common/x86/pixel.h            |     1 +
 source/common/x86/pixeladd8.asm      |    37 +-
 source/common/x86/sad-a.asm          |    99 +-
 source/common/x86/x86inc.asm         |     3 +-
 source/encoder/CMakeLists.txt        |     6 +-
 source/encoder/analysis.cpp          |    44 +-
 source/encoder/analysis.h            |     2 +-
 source/encoder/api.cpp               |    16 +-
 source/encoder/encoder.cpp           |    21 +-
 source/encoder/entropy.cpp           |     4 +-
 source/encoder/entropy.h             |     3 +-
 source/encoder/level.cpp             |    15 +-
 source/encoder/ratecontrol.cpp       |    46 +-
 source/encoder/ratecontrol.h         |     5 +-
 source/encoder/sao.cpp               |    48 +-
 source/encoder/search.cpp            |    25 +-
 source/input/input.cpp               |     2 +-
 source/input/input.h                 |    10 +-
 source/input/y4m.h                   |     2 +-
 source/input/yuv.h                   |     2 +-
 source/output/output.cpp             |    12 +-
 source/output/output.h               |    43 +-
 source/output/raw.cpp                |    77 +
 source/output/raw.h                  |    64 +
 source/output/y4m.h                  |     2 +-
 source/output/yuv.h                  |     2 +-
 source/test/ipfilterharness.cpp      |   122 +-
 source/test/ipfilterharness.h        |     1 -
 source/test/pixelharness.cpp         |    66 +-
 source/test/pixelharness.h           |     3 +-
 source/test/rate-control-tests.txt   |    52 +-
 source/x265.cpp                      |   166 +-
 source/x265.h                        |    10 +
 source/x265cli.h                     |     5 +
 64 files changed, 8158 insertions(+), 1118 deletions(-)

diffs (truncated from 11894 to 300 lines):

diff -r ada69b1deea9 -r 984e254f93f7 doc/reST/cli.rst

--- a/doc/reST/cli.rst	Tue Apr 07 23:30:16 2015 -0500
+++ b/doc/reST/cli.rst	Thu Apr 09 11:48:08 2015 -0500
@@ -201,11 +201,11 @@ Performance Options
 	their node, they will not be allowed to migrate between nodes, but they
 	will be allowed to move between CPU cores within their node.
 
-	If the three pool features: :option:`--wpp` :option:`--pmode` and
-	:option:`--pme` are all disabled, then :option:`--pools` is ignored
-	and no thread pools are created.
+	If the four pool features: :option:`--wpp`, :option:`--pmode`,
+	:option:`--pme` and :option:`--lookahead-slices` are all disabled,
+	then :option:`--pools` is ignored and no thread pools are created.
 
-	If "none" is specified, then all three of the thread pool features are
+	If "none" is specified, then all four of the thread pool features are
 	implicitly disabled.
 
 	Multiple thread pools will be allocated for any NUMA node with more than
@@ -217,6 +217,15 @@ Performance Options
 	:option:`--frame-threads`.  The pools are used for WPP and for
 	distributed analysis and motion search.
 
+	On Windows, the native APIs offer sufficient functionality to
+	discover the NUMA topology and enforce the thread affinity that
+	libx265 needs, but on POSIX systems it relies on libnuma for this
+	functionality. If your target POSIX system is single socket, then
+	building without libnuma is a perfectly reasonable option, as it
+	will have no effect on the runtime behavior. On a multiple-socket
+	system, a POSIX build of libx265 without libnuma will be less work
+	efficient. See :ref:`thread pools <pools>` for more detail.
+
 	Default "", one thread is allocated per detected hardware thread
 	(logical CPU cores) and one thread pool per NUMA node.
 
@@ -437,7 +446,7 @@ Profile, Level, Tier
 	times 10, for example level **5.1** is specified as "5.1" or "51",
 	and level **5.0** is specified as "5.0" or "50".
 
-	Annex A levels: 1, 2, 2.1, 3, 3.1, 4, 4.1, 5, 5.1, 5.2, 6, 6.1, 6.2
+	Annex A levels: 1, 2, 2.1, 3, 3.1, 4, 4.1, 5, 5.1, 5.2, 6, 6.1, 6.2, 8.5
 
 .. option:: --high-tier, --no-high-tier
 
@@ -464,11 +473,22 @@ Profile, Level, Tier
 	HEVC specification.  If x265 detects that the total reference count
 	is greater than 8, it will issue a warning that the resulting stream
 	is non-compliant and it signals the stream as profile NONE and level
-	NONE but still allows the encode to continue.  Compliant HEVC
+	NONE and will abort the encode unless
+	:option:`--allow-non-conformance` it specified.  Compliant HEVC
 	decoders may refuse to decode such streams.
 	
 	Default 3
 
+.. option:: --allow-non-conformance, --no-allow-non-conformance
+
+	Allow libx265 to generate a bitstream with profile and level NONE.
+	By default it will abort any encode which does not meet strict level
+	compliance. The two most likely causes for non-conformance are
+	:option:`--ctu` being too small, :option:`--ref` being too high,
+	or the bitrate or resolution being out of specification.
+
+	Default: disabled
+
 .. note::
 	:option:`--profile`, :option:`--level-idc`, and
 	:option:`--high-tier` are only intended for use when you are
@@ -476,7 +496,7 @@ Profile, Level, Tier
 	limitations and must constrain the bitstream within those limits.
 	Specifying a profile or level may lower the encode quality
 	parameters to meet those requirements but it will never raise
-	them.
+	them. It may enable VBV constraints on a CRF encode.
 
 Mode decision / Analysis
 ========================
@@ -1111,6 +1131,13 @@ Quality, rate control and rate distortio
 
 	**Range of values:** 0.0 to 3.0
 
+.. option:: --qg-size <64|32|16>
+	Enable adaptive quantization for sub-CTUs. This parameter specifies 
+	the minimum CU size at which QP can be adjusted, ie. Quantization Group
+	size. Allowed range of values are 64, 32, 16 provided this falls within 
+	the inclusive range [maxCUSize, minCUSize]. Experimental.
+	Default: same as maxCUSize
+
 .. option:: --cutree, --no-cutree
 
 	Enable the use of lookahead's lowres motion vector fields to
diff -r ada69b1deea9 -r 984e254f93f7 doc/reST/threading.rst
--- a/doc/reST/threading.rst	Tue Apr 07 23:30:16 2015 -0500
+++ b/doc/reST/threading.rst	Thu Apr 09 11:48:08 2015 -0500
@@ -2,6 +2,8 @@
 Threading
 *********
 
+.. _pools:
+
 Thread Pools
 ============
 
@@ -31,6 +33,17 @@ for data locking. If a job becomes block
 expected to drop that job so the worker thread may go back to the pool
 and find more work.
 
+On Windows, the native APIs offer sufficient functionality to discover
+the NUMA topology and enforce the thread affinity that libx265 needs,
+but on POSIX systems it relies on libnuma for this functionality. If
+your target POSIX system is single socket, then building without libnuma
+is a perfectly reasonable option, as it will have no effect on the
+runtime behavior. On a multiple-socket system, a POSIX build of libx265
+without libnuma will be less work efficient, but will still function
+correctly. You lose the work isolation effect that keeps each frame
+encoder from only using the threads of a single socket and so you incur
+a heavier context switching cost.
+
 Wavefront Parallel Processing
 =============================
 
@@ -225,6 +238,7 @@ scene cuts and slice types) uses the thr
 lowres cost analysis to worker threads. It will use bonded task groups
 to perform batches of frame cost estimates, and it may optionally use
 bonded task groups to measure single frame cost estimates using slices.
+(see :option:`--lookahead-slices`)
 
 The function slicetypeDecide() itself is also be performed by a worker
 thread if your encoder has a thread pool, else it runs within the
diff -r ada69b1deea9 -r 984e254f93f7 readme.rst
--- a/readme.rst	Tue Apr 07 23:30:16 2015 -0500
+++ b/readme.rst	Thu Apr 09 11:48:08 2015 -0500
@@ -3,7 +3,7 @@ x265 HEVC Encoder
 =================
 
 | **Read:** | Online `documentation <http://x265.readthedocs.org/en/default/>`_ | Developer `wiki <http://bitbucket.org/multicoreware/x265/wiki/>`_
-| **Download:** | `releases <http://bitbucket.org/multicoreware/x265/downloads/>`_ 
+| **Download:** | `releases <http://ftp.videolan.org/pub/videolan/x265/>`_ 
 | **Interact:** | #x265 on freenode.irc.net | `x265-devel at videolan.org <http://mailman.videolan.org/listinfo/x265-devel>`_ | `Report an issue <https://bitbucket.org/multicoreware/x265/issues?status=new&status=open>`_
 
 `x265 <https://www.videolan.org/developers/x265.html>`_ is an open
diff -r ada69b1deea9 -r 984e254f93f7 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Tue Apr 07 23:30:16 2015 -0500
+++ b/source/CMakeLists.txt	Thu Apr 09 11:48:08 2015 -0500
@@ -30,7 +30,7 @@ option(STATIC_LINK_CRT "Statically link 
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 51)
+set(X265_BUILD 54)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -67,13 +67,13 @@ if(UNIX)
     endif()
     find_package(Numa)
     if(NUMA_FOUND)
-        list(APPEND CMAKE_REQUIRED_LIBRARIES ${NUMA_LIBRARY})
+        link_directories(${NUMA_LIBRARY_DIR})
+        list(APPEND CMAKE_REQUIRED_LIBRARIES numa)
         check_symbol_exists(numa_node_of_cpu numa.h NUMA_V2)
         if(NUMA_V2)
             add_definitions(-DHAVE_LIBNUMA)
             message(STATUS "libnuma found, building with support for NUMA nodes")
-            list(APPEND PLATFORM_LIBS ${NUMA_LIBRARY})
-            link_directories(${NUMA_LIBRARY_DIR})
+            list(APPEND PLATFORM_LIBS numa)
             include_directories(${NUMA_INCLUDE_DIR})
         endif()
     endif()
@@ -196,6 +196,7 @@ if(GCC)
         add_definitions(-static)
         list(APPEND LINKER_OPTIONS "-static")
     endif(STATIC_LINK_CRT)
+    check_cxx_compiler_flag(-Wno-strict-overflow CC_HAS_NO_STRICT_OVERFLOW)
     check_cxx_compiler_flag(-Wno-narrowing CC_HAS_NO_NARROWING) 
     check_cxx_compiler_flag(-Wno-array-bounds CC_HAS_NO_ARRAY_BOUNDS) 
     if (CC_HAS_NO_ARRAY_BOUNDS)
diff -r ada69b1deea9 -r 984e254f93f7 source/common/common.cpp
--- a/source/common/common.cpp	Tue Apr 07 23:30:16 2015 -0500
+++ b/source/common/common.cpp	Thu Apr 09 11:48:08 2015 -0500
@@ -100,11 +100,14 @@ int x265_exp2fix8(double x)
     return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
 }
 
-void x265_log(const x265_param *param, int level, const char *fmt, ...)
+void general_log(const x265_param* param, const char* caller, int level, const char* fmt, ...)
 {
     if (param && level > param->logLevel)
         return;
-    const char *log_level;
+    const int bufferSize = 4096;
+    char buffer[bufferSize];
+    int p = 0;
+    const char* log_level;
     switch (level)
     {
     case X265_LOG_ERROR:
@@ -127,11 +130,13 @@ void x265_log(const x265_param *param, i
         break;
     }
 
-    fprintf(stderr, "x265 [%s]: ", log_level);
+    if (caller)
+        p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
     va_list arg;
     va_start(arg, fmt);
-    vfprintf(stderr, fmt, arg);
+    vsnprintf(buffer + p, bufferSize - p, fmt, arg);
     va_end(arg);
+    fputs(buffer, stderr);
 }
 
 double x265_ssim2dB(double ssim)
diff -r ada69b1deea9 -r 984e254f93f7 source/common/common.h
--- a/source/common/common.h	Tue Apr 07 23:30:16 2015 -0500
+++ b/source/common/common.h	Thu Apr 09 11:48:08 2015 -0500
@@ -413,7 +413,8 @@ void extendPicBorder(pixel* recon, intpt
 
 /* outside x265 namespace, but prefixed. defined in common.cpp */
 int64_t  x265_mdate(void);
-void     x265_log(const x265_param *param, int level, const char *fmt, ...);
+#define  x265_log(param, ...) general_log(param, "x265", __VA_ARGS__)
+void     general_log(const x265_param* param, const char* caller, int level, const char* fmt, ...);
 int      x265_exp2fix8(double x);
 
 double   x265_ssim2dB(double ssim);
diff -r ada69b1deea9 -r 984e254f93f7 source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp	Tue Apr 07 23:30:16 2015 -0500
+++ b/source/common/ipfilter.cpp	Thu Apr 09 11:48:08 2015 -0500
@@ -34,27 +34,8 @@ using namespace x265;
 #endif
 
 namespace {
-template<int dstStride, int width, int height>
-void pixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst)
-{
-    int shift = IF_INTERNAL_PREC - X265_DEPTH;
-    int row, col;
-
-    for (row = 0; row < height; row++)
-    {
-        for (col = 0; col < width; col++)
-        {
-            int16_t val = src[col] << shift;
-            dst[col] = val - (int16_t)IF_INTERNAL_OFFS;
-        }
-
-        src += srcStride;
-        dst += dstStride;
-    }
-}
-
-template<int dstStride>
-void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height)
+template<int width, int height>
+void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride)
 {
     int shift = IF_INTERNAL_PREC - X265_DEPTH;
     int row, col;
@@ -398,7 +379,7 @@ namespace x265 {
     p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>;  \
     p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>;  \
     p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
-    p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>; 
+    p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
 
 #define CHROMA_422(W, H) \
     p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
@@ -407,7 +388,7 @@ namespace x265 {
     p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>;  \
     p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>;  \
     p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
-    p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>; 
+    p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
 
 #define CHROMA_444(W, H) \
     p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
@@ -416,7 +397,7 @@ namespace x265 {
     p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>;  \
     p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>;  \
     p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
-    p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>; 
+    p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
 
 #define LUMA(W, H) \
     p.pu[LUMA_ ## W ## x ## H].luma_hpp     = interp_horiz_pp_c<8, W, H>; \
@@ -426,7 +407,7 @@ namespace x265 {
     p.pu[LUMA_ ## W ## x ## H].luma_vsp     = interp_vert_sp_c<8, W, H>;  \
     p.pu[LUMA_ ## W ## x ## H].luma_vss     = interp_vert_ss_c<8, W, H>;  \
     p.pu[LUMA_ ## W ## x ## H].luma_hvpp    = interp_hv_pp_c<8, W, H>; \
-    p.pu[LUMA_ ## W ## x ## H].filter_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>
+    p.pu[LUMA_ ## W ## x ## H].convert_p2s = filterPixelToShort_c<W, H>;
 
 void setupFilterPrimitives_c(EncoderPrimitives& p)
 {
@@ -530,11 +511,6 @@ void setupFilterPrimitives_c(EncoderPrim
     CHROMA_444(48, 64);
     CHROMA_444(64, 16);
     CHROMA_444(16, 64);