[x265-commits] [x265] cleanup: remove unused code in mc-a2.asm

Tue Nov 26 09:29:05 CET 2013

details:   http://hg.videolan.org/x265/rev/464af047f7b1
branches:  
changeset: 5291:464af047f7b1
user:      Min Chen <chenm003 at 163.com>
date:      Sun Nov 24 14:52:31 2013 +0800
description:
cleanup: remove unused code in mc-a2.asm
Subject: [x265] cleanup: remove unused code in pixel-a.asm

details:   http://hg.videolan.org/x265/rev/513f564ba360
branches:  
changeset: 5292:513f564ba360
user:      Min Chen <chenm003 at 163.com>
date:      Sun Nov 24 16:53:56 2013 +0800
description:
cleanup: remove unused code in pixel-a.asm
Subject: [x265] cleanup: remove unused constant in pixel-a.asm

details:   http://hg.videolan.org/x265/rev/c0c862dc71fb
branches:  
changeset: 5293:c0c862dc71fb
user:      Min Chen <chenm003 at 163.com>
date:      Sun Nov 24 17:34:12 2013 +0800
description:
cleanup: remove unused constant in pixel-a.asm
Subject: [x265] cleanup: remove unused code in mc-a.asm

details:   http://hg.videolan.org/x265/rev/9c7142ced7c4
branches:  
changeset: 5294:9c7142ced7c4
user:      Min Chen <chenm003 at 163.com>
date:      Mon Nov 25 12:03:42 2013 +0800
description:
cleanup: remove unused code in mc-a.asm
Subject: [x265] asm: assembly code for dequant_normal

details:   http://hg.videolan.org/x265/rev/67e8ecb2b0e5
branches:  
changeset: 5295:67e8ecb2b0e5
user:      Min Chen <chenm003 at 163.com>
date:      Mon Nov 25 14:19:59 2013 +0800
description:
asm: assembly code for dequant_normal
Subject: [x265] cleanup the temporary function pointer initialization

details:   http://hg.videolan.org/x265/rev/b54870f0cdd3
branches:  
changeset: 5296:b54870f0cdd3
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Mon Nov 25 17:05:59 2013 +0550
description:
cleanup the temporary function pointer initialization
Subject: [x265] asm : routine for weight_pp(), for input width in multiples of 16

details:   http://hg.videolan.org/x265/rev/3e4c257d88ab
branches:  
changeset: 5297:3e4c257d88ab
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 18:01:55 2013 +0550
description:
asm : routine for weight_pp(), for input width in multiples of 16
Subject: [x265] Test bench modifications for weight_pp() asm routine.

details:   http://hg.videolan.org/x265/rev/13126513fe61
branches:  
changeset: 5298:13126513fe61
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 18:15:25 2013 +0550
description:
Test bench modifications for weight_pp() asm routine.
Subject: [x265] Adding asm function declaration and initialization for weight_pp asm routine.

details:   http://hg.videolan.org/x265/rev/be74f1731279
branches:  
changeset: 5299:be74f1731279
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 18:16:49 2013 +0550
description:
Adding asm function declaration and initialization for weight_pp asm routine.
Subject: [x265] asm: move constant 8192 to const-a.asm for share

details:   http://hg.videolan.org/x265/rev/a5c7cd496583
branches:  
changeset: 5300:a5c7cd496583
user:      Min Chen <chenm003 at 163.com>
date:      Mon Nov 25 22:30:03 2013 +0800
description:
asm: move constant 8192 to const-a.asm for share
Subject: [x265] asm : routine for weight_sp().

details:   http://hg.videolan.org/x265/rev/d9d6b8b4e4f1
branches:  
changeset: 5301:d9d6b8b4e4f1
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 18:34:53 2013 +0550
description:
asm : routine for weight_sp().
Subject: [x265] Adding asm function declaration and initialization for weight_sp asm routine

details:   http://hg.videolan.org/x265/rev/47ef19a1734c
branches:  
changeset: 5302:47ef19a1734c
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 18:37:31 2013 +0550
description:
Adding asm function declaration and initialization for weight_sp asm routine
Subject: [x265] Test bench modifications for weight_sp() asm routine

details:   http://hg.videolan.org/x265/rev/3e688d424f05
branches:  
changeset: 5303:3e688d424f05
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Mon Nov 25 19:19:48 2013 +0550
description:
Test bench modifications for weight_sp() asm routine
Subject: [x265] asm: assembly code for sse_ss - 4xN, 8xN, 16xN

details:   http://hg.videolan.org/x265/rev/7cab79758dd7
branches:  
changeset: 5304:7cab79758dd7
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Mon Nov 25 21:00:49 2013 +0550
description:
asm: assembly code for sse_ss - 4xN, 8xN, 16xN
Subject: [x265] Test bench: code for pixel_var

details:   http://hg.videolan.org/x265/rev/529bd0084265
branches:  
changeset: 5305:529bd0084265
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Mon Nov 25 21:16:28 2013 +0550
description:
Test bench: code for pixel_var
Subject: [x265] asm: assembly code for pixel_sse_ss_12x16

details:   http://hg.videolan.org/x265/rev/71262c718dfa
branches:  
changeset: 5306:71262c718dfa
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Mon Nov 25 21:24:32 2013 +0550
description:
asm: assembly code for pixel_sse_ss_12x16
Subject: [x265] asm: code for pixel_var_8xN

details:   http://hg.videolan.org/x265/rev/da18434af735
branches:  
changeset: 5307:da18434af735
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Mon Nov 25 21:37:38 2013 +0550
description:
asm: code for pixel_var_8xN
Subject: [x265] asm: assembly code for intra_pred_planar[4x4]

details:   http://hg.videolan.org/x265/rev/6a8fbb091722
branches:  
changeset: 5308:6a8fbb091722
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Mon Nov 25 21:47:53 2013 +0550
description:
asm: assembly code for intra_pred_planar[4x4]
Subject: [x265] asm: assembly code for pixel_sse_ss_32xN

details:   http://hg.videolan.org/x265/rev/8075b13cee00
branches:  
changeset: 5309:8075b13cee00
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Mon Nov 25 21:52:10 2013 +0550
description:
asm: assembly code for pixel_sse_ss_32xN
Subject: [x265] asm: code for pixel_var_16xN

details:   http://hg.videolan.org/x265/rev/672ae35d4e5f
branches:  
changeset: 5310:672ae35d4e5f
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Mon Nov 25 21:58:33 2013 +0550
description:
asm: code for pixel_var_16xN
Subject: [x265] 合并 multicoreware/x265 到 default

details:   http://hg.videolan.org/x265/rev/06d509e2e687
branches:  
changeset: 5311:06d509e2e687
user:      chenm003 <chenm003 at 163.com>
date:      Tue Nov 26 10:49:27 2013 +0800
description:
合并 multicoreware/x265 到 default
Subject: [x265] asm: fix build error on x64

details:   http://hg.videolan.org/x265/rev/116d91f08fcb
branches:  
changeset: 5312:116d91f08fcb
user:      Min Chen <chenm003 at 163.com>
date:      Tue Nov 26 14:19:27 2013 +0800
description:
asm: fix build error on x64
Subject: [x265] api: document a few rate control settings

details:   http://hg.videolan.org/x265/rev/5accd2ae5ceb
branches:  
changeset: 5313:5accd2ae5ceb
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 26 01:12:49 2013 -0600
description:
api: document a few rate control settings
Subject: [x265] pixel: remove intrinsic pixel weight functions, we have asm coverage

details:   http://hg.videolan.org/x265/rev/491fd3ee6fd1
branches:  
changeset: 5314:491fd3ee6fd1
user:      Steve Borho <steve at borho.org>
date:      Mon Nov 25 14:00:56 2013 -0600
description:
pixel: remove intrinsic pixel weight functions, we have asm coverage

diffstat:

 source/Lib/TLibEncoder/TEncCu.cpp    |    16 +-
 source/common/dct.cpp                |     5 +-
 source/common/pixel.cpp              |    11 +-
 source/common/vec/pixel-sse41.cpp    |    90 -
 source/common/x86/asm-primitives.cpp |    84 +-
 source/common/x86/const-a.asm        |     2 +
 source/common/x86/intrapred.asm      |    62 +-
 source/common/x86/intrapred.h        |     2 +
 source/common/x86/ipfilter8.asm      |    51 +-
 source/common/x86/mc-a.asm           |   668 -------
 source/common/x86/mc-a2.asm          |   824 --------
 source/common/x86/pixel-a.asm        |  3162 ++++++---------------------------
 source/common/x86/pixel-util.asm     |   211 ++
 source/common/x86/pixel.h            |    41 +-
 source/encoder/compress.cpp          |     2 +-
 source/encoder/encoder.cpp           |   146 +-
 source/test/pixelharness.cpp         |    45 +-
 source/test/pixelharness.h           |     1 +
 source/x265.h                        |    15 +-
 19 files changed, 1121 insertions(+), 4317 deletions(-)

diffs (truncated from 6143 to 300 lines):

diff -r 10f605bd0530 -r 491fd3ee6fd1 source/Lib/TLibEncoder/TEncCu.cpp

--- a/source/Lib/TLibEncoder/TEncCu.cpp	Fri Nov 22 14:59:34 2013 -0600
+++ b/source/Lib/TLibEncoder/TEncCu.cpp	Mon Nov 25 14:00:56 2013 -0600
@@ -636,14 +636,26 @@ void TEncCu::xCompressIntraCU(TComDataCU
         if (outBestCU->m_totalCost < outTempCU->m_totalCost)
         {
             m_log->cntIntra[depth]++;
-            m_log->cntIntra[depth + 1] = m_log->cntIntra[depth + 1] - 4 + boundaryCu;
+            for (int i = 0; i < 4; i++)
+            {
+                if (outTempCU->getPartitionSize(i) != SIZE_NxN)
+                    m_log->cntIntra[depth + 1]--;
+                else
+                    m_log->cntIntraNxN--;
+            }
+            m_log->cntIntra[depth + 1] += boundaryCu;
         }
         xCheckBestMode(outBestCU, outTempCU, depth); // RD compare current prediction with split prediction.
     }
 
     if (depth == g_maxCUDepth - 1 && bSubBranch)
     {
-        m_log->cntIntra[depth]++;
+        if (outBestCU->getPartitionSize(0) == SIZE_NxN)
+        {
+            m_log->cntIntraNxN++;
+        }
+        else
+            m_log->cntIntra[depth]++;
     }
     outBestCU->copyToPic(depth); // Copy Best data to Picture for next partition prediction.
 
diff -r 10f605bd0530 -r 491fd3ee6fd1 source/common/dct.cpp
--- a/source/common/dct.cpp	Fri Nov 22 14:59:34 2013 -0600
+++ b/source/common/dct.cpp	Mon Nov 25 14:00:56 2013 -0600
@@ -720,8 +720,11 @@ void idct32_c(int32_t *src, int16_t *dst
 
 void dequant_normal_c(const int32_t* quantCoef, int32_t* coef, int num, int scale, int shift)
 {
-    static const int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
     assert(num <= 32 * 32);
+    // NOTE: maximum of scale is (72 * 256)
+    assert(scale < 32768);
+    assert((num % 8) == 0);
+    assert(shift <= 6);
 
     int add, coeffQ;
 
diff -r 10f605bd0530 -r 491fd3ee6fd1 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Fri Nov 22 14:59:34 2013 -0600
+++ b/source/common/pixel.cpp	Mon Nov 25 14:00:56 2013 -0600
@@ -968,8 +968,17 @@ void Setup_C_PixelPrimitives(EncoderPrim
     p.ssim_4x4x2_core = ssim_4x4x2_core;
     p.ssim_end_4 = ssim_end_4;
 
+    p.var[LUMA_8x4] = pixel_var<8, 4>;
+    p.var[LUMA_8x8] = pixel_var<8, 8>;
+    p.var[LUMA_8x16] = pixel_var<8, 16>;
+    p.var[LUMA_8x32] = pixel_var<8, 32>;
+    p.var[LUMA_16x4] = pixel_var<16, 4>;
+    p.var[LUMA_16x8] = pixel_var<16, 8>;
+    p.var[LUMA_16x12] = pixel_var<16, 12>;
     p.var[LUMA_16x16] = pixel_var<16, 16>;
-    p.var[LUMA_8x8] = pixel_var<8, 8>;
+    p.var[LUMA_16x32] = pixel_var<16, 32>;
+    p.var[LUMA_16x64] = pixel_var<16, 64>;
+
     p.plane_copy_deinterleave_c = plane_copy_deinterleave_chroma;
 }
 }
diff -r 10f605bd0530 -r 491fd3ee6fd1 source/common/vec/pixel-sse41.cpp
--- a/source/common/vec/pixel-sse41.cpp	Fri Nov 22 14:59:34 2013 -0600
+++ b/source/common/vec/pixel-sse41.cpp	Mon Nov 25 14:00:56 2013 -0600
@@ -33,94 +33,6 @@ using namespace x265;
 
 namespace {
 #if !HIGH_BIT_DEPTH
-void weight_sp(int16_t *src, pixel *dst, intptr_t srcStride, intptr_t dstStride, int width, int height, int w0, int round, int shift, int offset)
-{
-    __m128i w00, roundoff, ofs, fs, tmpsrc, tmpdst, tmp, sign;
-    int x, y;
-
-    w00 = _mm_set1_epi32(w0);
-    ofs = _mm_set1_epi32(IF_INTERNAL_OFFS);
-    fs = _mm_set1_epi32(offset);
-    roundoff = _mm_set1_epi32(round);
-    for (y = height - 1; y >= 0; y--)
-    {
-        for (x = 0; x <= width - 4; x += 4)
-        {
-            tmpsrc = _mm_loadl_epi64((__m128i*)(src + x));
-            sign = _mm_srai_epi16(tmpsrc, 15);
-            tmpsrc = _mm_unpacklo_epi16(tmpsrc, sign);
-            tmpdst = _mm_add_epi32(_mm_srai_epi32(_mm_add_epi32(_mm_mullo_epi32(w00, _mm_add_epi32(tmpsrc, ofs)), roundoff), shift), fs);
-            *(uint32_t*)(dst + x) = _mm_cvtsi128_si32(_mm_packus_epi16(_mm_packs_epi32(tmpdst, tmpdst), _mm_setzero_si128()));
-        }
-
-        if (width > x)
-        {
-            tmpsrc = _mm_loadl_epi64((__m128i*)(src + x));
-            sign = _mm_srai_epi16(tmpsrc, 15);
-            tmpsrc = _mm_unpacklo_epi16(tmpsrc, sign);
-            tmpdst = _mm_add_epi32(_mm_srai_epi32(_mm_add_epi32(_mm_mullo_epi32(w00, _mm_add_epi32(tmpsrc, ofs)), roundoff), shift), fs);
-            tmp = _mm_packus_epi16(_mm_packs_epi32(tmpdst, tmpdst), _mm_setzero_si128());
-            union
-            {
-                int8_t  c[16];
-                int16_t s[8];
-            } u;
-
-            _mm_storeu_si128((__m128i*)u.c, tmp);
-            ((int16_t*)(dst + x))[0] = u.s[0];    //to store only first 16-bit from 128-bit to memory
-        }
-        src += srcStride;
-        dst += dstStride;
-    }
-}
-
-void weight_pp(pixel *source, pixel *dest, intptr_t sourceStride, intptr_t destStride, int width, int height, int w0, int arg_round, int shift, int offset)
-{
-    int x, y;
-    __m128i temp;
-    __m128i vw0    = _mm_set1_epi32(w0); // broadcast (32-bit integer) w0 to all elements of vw0
-    __m128i ofs    = _mm_set1_epi32(offset);
-    __m128i round  = _mm_set1_epi32(arg_round);
-    __m128i src, dst, val;
-
-    for (y = height - 1; y >= 0; y--)
-    {
-        for (x = 0; x <= width - 4; x += 4)
-        {
-            // The intermediate results would outgrow 16 bits because internal offset is too high
-            temp = _mm_cvtsi32_si128(*(uint32_t*)(source + x));
-            src = _mm_unpacklo_epi16(_mm_unpacklo_epi8(temp, _mm_setzero_si128()), _mm_setzero_si128());
-            val = _mm_slli_epi32(src, (IF_INTERNAL_PREC - X265_DEPTH));
-            dst = _mm_add_epi32(_mm_mullo_epi32(vw0, val), round);
-            dst =  _mm_sra_epi32(dst, _mm_cvtsi32_si128(shift));
-            dst = _mm_add_epi32(dst, ofs);
-            *(uint32_t*)(dest + x) = _mm_cvtsi128_si32(_mm_packus_epi16(_mm_packs_epi32(dst, dst), _mm_setzero_si128()));
-        }
-
-        if (width > x)
-        {
-            temp = _mm_cvtsi32_si128(*(uint32_t*)(source + x));
-            src = _mm_unpacklo_epi16(_mm_unpacklo_epi8(temp, _mm_setzero_si128()), _mm_setzero_si128());
-            val = _mm_slli_epi32(src, (IF_INTERNAL_PREC - X265_DEPTH));
-            dst = _mm_add_epi32(_mm_mullo_epi32(vw0, val), round);
-            dst =  _mm_sra_epi32(dst, _mm_cvtsi32_si128(shift));
-            dst = _mm_add_epi32(dst, ofs);
-            temp = _mm_packus_epi16(_mm_packs_epi32(dst, dst), _mm_setzero_si128());
-
-            union
-            {
-                int8_t  c[16];
-                int16_t s[8];
-            } u;
-
-            _mm_storeu_si128((__m128i*)u.c, temp);
-            ((int16_t*)(dest + x))[0] = u.s[0];
-        }
-        source += sourceStride;
-        dest += destStride;
-    }
-}
-
 template<int ly>
 int sse_sp4(int16_t* fenc, intptr_t strideFenc, pixel* fref, intptr_t strideFref)
 {
@@ -777,8 +689,6 @@ void Setup_Vec_PixelPrimitives_sse41(Enc
 #if HIGH_BIT_DEPTH
     Setup_Vec_Pixel16Primitives_sse41(p);
 #else
-    p.weight_pp = weight_pp;
-    p.weight_sp = weight_sp;
 #endif /* !HIGH_BIT_DEPTH */
 }
 }
diff -r 10f605bd0530 -r 491fd3ee6fd1 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Fri Nov 22 14:59:34 2013 -0600
+++ b/source/common/x86/asm-primitives.cpp	Mon Nov 25 14:00:56 2013 -0600
@@ -25,6 +25,7 @@
 
 #include "primitives.h"
 #include "x265.h"
+#include "cpu.h"
 
 extern "C" {
 #include "pixel.h"
@@ -87,6 +88,27 @@ extern "C" {
     p.sse_pp[LUMA_32x64] = x265_pixel_ssd_32x64_ ## cpu; \
     p.sse_pp[LUMA_16x64] = x265_pixel_ssd_16x64_ ## cpu
 
+#define ASSGN_SSE_SS(cpu) \
+    p.sse_ss[LUMA_4x4]   = x265_pixel_ssd_ss_4x4_ ## cpu; \
+    p.sse_ss[LUMA_4x8]   = x265_pixel_ssd_ss_4x8_ ## cpu; \
+    p.sse_ss[LUMA_4x16]   = x265_pixel_ssd_ss_4x16_ ## cpu; \
+    p.sse_ss[LUMA_8x4]   = x265_pixel_ssd_ss_8x4_ ## cpu; \
+    p.sse_ss[LUMA_8x8]   = x265_pixel_ssd_ss_8x8_ ## cpu; \
+    p.sse_ss[LUMA_8x16]   = x265_pixel_ssd_ss_8x16_ ## cpu; \
+    p.sse_ss[LUMA_8x32]   = x265_pixel_ssd_ss_8x32_ ## cpu; \
+    p.sse_ss[LUMA_12x16]   = x265_pixel_ssd_ss_12x16_ ## cpu; \
+    p.sse_ss[LUMA_16x4]   = x265_pixel_ssd_ss_16x4_ ## cpu; \
+    p.sse_ss[LUMA_16x8]   = x265_pixel_ssd_ss_16x8_ ## cpu; \
+    p.sse_ss[LUMA_16x12]   = x265_pixel_ssd_ss_16x12_ ## cpu; \
+    p.sse_ss[LUMA_16x16]   = x265_pixel_ssd_ss_16x16_ ## cpu; \
+    p.sse_ss[LUMA_16x32]   = x265_pixel_ssd_ss_16x32_ ## cpu; \
+    p.sse_ss[LUMA_16x64]   = x265_pixel_ssd_ss_16x64_ ## cpu; \
+    p.sse_ss[LUMA_32x8]   = x265_pixel_ssd_ss_32x8_ ## cpu; \
+    p.sse_ss[LUMA_32x16]   = x265_pixel_ssd_ss_32x16_ ## cpu; \
+    p.sse_ss[LUMA_32x24]   = x265_pixel_ssd_ss_32x24_ ## cpu; \
+    p.sse_ss[LUMA_32x32]   = x265_pixel_ssd_ss_32x32_ ## cpu; \
+    p.sse_ss[LUMA_32x64]   = x265_pixel_ssd_ss_32x64_ ## cpu;
+
 #define SA8D_INTER_FROM_BLOCK(cpu) \
     p.sa8d_inter[LUMA_4x8]  = x265_pixel_satd_4x8_ ## cpu; \
     p.sa8d_inter[LUMA_8x4]  = x265_pixel_satd_8x4_ ## cpu; \
@@ -412,6 +434,21 @@ extern "C" {
     SETUP_LUMA_BLOCKCOPY_FUNC_DEF(64, 16, cpu); \
     SETUP_LUMA_BLOCKCOPY_FUNC_DEF(16, 64, cpu);
 
+#define SETUP_PIXEL_VAR_DEF(W, H, cpu) \
+    p.var[LUMA_ ## W ## x ## H] = x265_pixel_var_ ## W ## x ## H ## cpu;
+
+#define LUMA_VAR(cpu) \
+    SETUP_PIXEL_VAR_DEF(8,   4, cpu); \
+    SETUP_PIXEL_VAR_DEF(8,   8, cpu); \
+    SETUP_PIXEL_VAR_DEF(8,  16, cpu); \
+    SETUP_PIXEL_VAR_DEF(8,  32, cpu); \
+    SETUP_PIXEL_VAR_DEF(16,  4, cpu); \
+    SETUP_PIXEL_VAR_DEF(16,  8, cpu); \
+    SETUP_PIXEL_VAR_DEF(16, 12, cpu); \
+    SETUP_PIXEL_VAR_DEF(16, 16, cpu); \
+    SETUP_PIXEL_VAR_DEF(16, 32, cpu); \
+    SETUP_PIXEL_VAR_DEF(16, 64, cpu);
+
 namespace x265 {
 // private x265 namespace
 
@@ -442,6 +479,8 @@ void Setup_Assembly_Primitives(EncoderPr
         PIXEL_AVG(sse2);
         PIXEL_AVG_W4(mmx2);
 
+        LUMA_VAR(_sse2);
+
         p.sad[LUMA_8x32]  = x265_pixel_sad_8x32_sse2;
         p.sad[LUMA_16x4]  = x265_pixel_sad_16x4_sse2;
         p.sad[LUMA_16x12] = x265_pixel_sad_16x12_sse2;
@@ -464,6 +503,7 @@ void Setup_Assembly_Primitives(EncoderPr
         p.sad[LUMA_12x16] = x265_pixel_sad_12x16_sse2;
 
         ASSGN_SSE(sse2);
+        ASSGN_SSE_SS(sse2);
         INIT2(sad, _sse2);
         INIT2(sad_x3, _sse2);
         INIT2(sad_x4, _sse2);
@@ -608,52 +648,15 @@ void Setup_Assembly_Primitives(EncoderPr
         CHROMA_FILTERS(_sse4);
         LUMA_FILTERS(_sse4);
         HEVC_SATD(sse4);
+        ASSGN_SSE_SS(sse4);
         p.chroma[X265_CSP_I420].copy_sp[CHROMA_2x4] = x265_blockcopy_sp_2x4_sse4;
         p.chroma[X265_CSP_I420].copy_sp[CHROMA_2x8] = x265_blockcopy_sp_2x8_sse4;
         p.chroma[X265_CSP_I420].copy_sp[CHROMA_6x8] = x265_blockcopy_sp_6x8_sse4;
 
-        // This function pointer initialization is temporary will be removed
-        // later with macro definitions.  It is used to avoid linker errors
-        // until all partitions are coded and commit smaller patches, easier to
-        // review.
-
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_2x8] = x265_pixel_add_ps_2x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_2x4] = x265_pixel_add_ps_2x4_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_4x2] = x265_pixel_add_ps_4x2_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_4x4] = x265_pixel_add_ps_4x4_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_4x8] = x265_pixel_add_ps_4x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_4x16] = x265_pixel_add_ps_4x16_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_6x8] = x265_pixel_add_ps_6x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x2] = x265_pixel_add_ps_8x2_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x4] = x265_pixel_add_ps_8x4_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x6] = x265_pixel_add_ps_8x6_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x8] = x265_pixel_add_ps_8x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x16] = x265_pixel_add_ps_8x16_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_8x32] = x265_pixel_add_ps_8x32_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_12x16] = x265_pixel_add_ps_12x16_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_16x4] = x265_pixel_add_ps_16x4_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_16x8] = x265_pixel_add_ps_16x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_16x12] = x265_pixel_add_ps_16x12_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_16x16] = x265_pixel_add_ps_16x16_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_16x32] = x265_pixel_add_ps_16x32_sse4;
-        p.luma_add_ps[LUMA_16x64] = x265_pixel_add_ps_16x64_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_24x32] = x265_pixel_add_ps_24x32_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_32x8] = x265_pixel_add_ps_32x8_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_32x16] = x265_pixel_add_ps_32x16_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_32x24] = x265_pixel_add_ps_32x24_sse4;
-        p.chroma[X265_CSP_I420].add_ps[CHROMA_32x32] = x265_pixel_add_ps_32x32_sse4;
-        p.luma_add_ps[LUMA_32x64] = x265_pixel_add_ps_32x64_sse4;
-
         p.chroma[X265_CSP_I420].filter_vsp[CHROMA_2x4] = x265_interp_4tap_vert_sp_2x4_sse4;
         p.chroma[X265_CSP_I420].filter_vsp[CHROMA_2x8] = x265_interp_4tap_vert_sp_2x8_sse4;
         p.chroma[X265_CSP_I420].filter_vsp[CHROMA_6x8] = x265_interp_4tap_vert_sp_6x8_sse4;