[x265-commits] [x265] encoder: fix warning of potentially unused locals

Thu Dec 5 21:01:39 CET 2013

details:   http://hg.videolan.org/x265/rev/c8ca8c93083b
branches:  
changeset: 5509:c8ca8c93083b
user:      Steve Borho <steve at borho.org>
date:      Thu Dec 05 01:32:44 2013 -0600
description:
encoder: fix warning of potentially unused locals
Subject: [x265] Enable topskip and earlyexit for all rd levels <= 4 (output changes for presets faster than "slow")

details:   http://hg.videolan.org/x265/rev/e44315ab36b9
branches:  
changeset: 5510:e44315ab36b9
user:      Deepthi Devaki <deepthidevaki at multicorewareinc.com>
date:      Wed Dec 04 13:04:39 2013 +0530
description:
Enable topskip and earlyexit for all rd levels <= 4 (output changes for presets faster than "slow")

Also use the encodeResandCalcRDInter instead of the refactored estimate function.
Subject: [x265] rdlevel: Add code for rdlevel 2

details:   http://hg.videolan.org/x265/rev/6694ef611b41
branches:  
changeset: 5511:6694ef611b41
user:      Deepthi Devaki <deepthidevaki at multicorewareinc.com>
date:      Wed Dec 04 13:05:54 2013 +0530
description:
rdlevel: Add code for rdlevel 2

Use signalling bits + sa8d cost to choose best among inter/merge/intra. Encode only best mode at each depth.
Subject: [x265] rdlevel: compare Merge-skip(merge2Nx2N with no residue) to best among inter/intra/merge in rdlevel 2

details:   http://hg.videolan.org/x265/rev/4668ede3a332
branches:  
changeset: 5512:4668ede3a332
user:      Deepthi Devaki <deepthidevaki at multicorewareinc.com>
date:      Wed Dec 04 13:06:17 2013 +0530
description:
rdlevel: compare Merge-skip(merge2Nx2N with no residue) to best among inter/intra/merge in rdlevel 2
Subject: [x265] rdlevel: skip Intra if inter/merge sa8d less than a threshold

details:   http://hg.videolan.org/x265/rev/e7424e0cb60f
branches:  
changeset: 5513:e7424e0cb60f
user:      Deepthi Devaki <deepthidevaki at multicorewareinc.com>
date:      Wed Dec 04 13:06:38 2013 +0530
description:
rdlevel: skip Intra if inter/merge sa8d less than a threshold

In higher rdlevels Intra is skipped if inter/merge cu cbf is 0. A threshold of sa8d expects that cu cbf will be 0.
Thresholds have to be refined further.
Subject: [x265] rename IntraPred.cpp to intrapred.cpp to avoid team's hg merge conflict

details:   http://hg.videolan.org/x265/rev/b04134971883
branches:  
changeset: 5514:b04134971883
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 14:53:12 2013 +0800
description:
rename IntraPred.cpp to intrapred.cpp to avoid team's hg merge conflict
Subject: [x265] asm: simplify code by use intra_pred_ang[][], and avoid build error when disable yasm

details:   http://hg.videolan.org/x265/rev/dcc2e11e5643
branches:  
changeset: 5515:dcc2e11e5643
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 13:05:31 2013 +0800
description:
asm: simplify code by use intra_pred_ang[][], and avoid build error when disable yasm
Subject: [x265] asm : Modifications for luma_hps and chroma_hps(extra rows)

details:   http://hg.videolan.org/x265/rev/6a0f7924321e
branches:  
changeset: 5516:6a0f7924321e
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Wed Dec 04 20:24:14 2013 +0550
description:
asm : Modifications for luma_hps and chroma_hps(extra rows)
Subject: [x265] C primitive changes for luma_hps and chroma_hps.

details:   http://hg.videolan.org/x265/rev/835ee97789af
branches:  
changeset: 5517:835ee97789af
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Wed Dec 04 20:37:55 2013 +0550
description:
C primitive changes for luma_hps and chroma_hps.
Subject: [x265] Test bench code for luma_hps and chroma_hps

details:   http://hg.videolan.org/x265/rev/06f89ffdba43
branches:  
changeset: 5518:06f89ffdba43
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Wed Dec 04 20:40:33 2013 +0550
description:
Test bench code for luma_hps and chroma_hps
Subject: [x265] Function declarations for modified luma_hps and chroma_hps functions.

details:   http://hg.videolan.org/x265/rev/79d649d551f0
branches:  
changeset: 5519:79d649d551f0
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Wed Dec 04 20:43:39 2013 +0550
description:
Function declarations for modified luma_hps and chroma_hps functions.
Subject: [x265] Merge branch 'X'

details:   http://hg.videolan.org/x265/rev/78165334eed6
branches:  
changeset: 5520:78165334eed6
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 15:42:36 2013 +0800
description:
Merge branch 'X'
Subject: [x265] cleanup unused array intra_ang4[]

details:   http://hg.videolan.org/x265/rev/70a042f36c2c
branches:  
changeset: 5521:70a042f36c2c
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 17:02:16 2013 +0800
description:
cleanup unused array intra_ang4[]
Subject: [x265] integrating asm code for sa8d in primitives.cpp

details:   http://hg.videolan.org/x265/rev/b7656aa5f346
branches:  
changeset: 5522:b7656aa5f346
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Dec 05 14:52:26 2013 +0550
description:
integrating asm code for sa8d in primitives.cpp

there was no separate functions for sa8d in assembly, we are just re-using sa8d_inter functions for sa8d.
Subject: [x265] asm: 10bpp code for scale2D_64to32 routine

details:   http://hg.videolan.org/x265/rev/1845917cb66d
branches:  
changeset: 5523:1845917cb66d
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 15:59:02 2013 +0550
description:
asm: 10bpp code for scale2D_64to32 routine
Subject: [x265] asm: improvement intra_pred_ang by SSE4(pextrd,pextrb)

details:   http://hg.videolan.org/x265/rev/c3d07f251bd8
branches:  
changeset: 5524:c3d07f251bd8
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 18:28:55 2013 +0800
description:
asm: improvement intra_pred_ang by SSE4(pextrd,pextrb)
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 11 & 25

details:   http://hg.videolan.org/x265/rev/c8641f015e5b
branches:  
changeset: 5525:c8641f015e5b
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 18:34:38 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 11 & 25
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 12 & 24

details:   http://hg.videolan.org/x265/rev/e39c11970ca0
branches:  
changeset: 5526:e39c11970ca0
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 18:39:01 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 12 & 24
Subject: [x265] testbench: swap order to call asm code

details:   http://hg.videolan.org/x265/rev/b9e0bfacfb8e
branches:  
changeset: 5527:b9e0bfacfb8e
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 18:58:17 2013 +0800
description:
testbench: swap order to call asm code

Our old intra_pred_ang algorithm will fill buffer before input pLeft and pabove,
in this time, the offset [-1] pixel equal to [4], it affect detect asm
code error, so I swap the order
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 13 & 23

details:   http://hg.videolan.org/x265/rev/7995a50e0fc2
branches:  
changeset: 5528:7995a50e0fc2
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 19:11:53 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 13 & 23
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 14 & 22

details:   http://hg.videolan.org/x265/rev/88e38d7f926b
branches:  
changeset: 5529:88e38d7f926b
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 19:18:35 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 14 & 22
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 15 & 21

details:   http://hg.videolan.org/x265/rev/2ae36352e08c
branches:  
changeset: 5530:2ae36352e08c
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 19:38:28 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 15 & 21
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 16 & 20

details:   http://hg.videolan.org/x265/rev/d551487023ba
branches:  
changeset: 5531:d551487023ba
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 19:44:53 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 16 & 20
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 17 & 19

details:   http://hg.videolan.org/x265/rev/59f0433ffca0
branches:  
changeset: 5532:59f0433ffca0
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 20:43:45 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 17 & 19
Subject: [x265] asm: assembly code for IntraPredAng4x4 Mode 18

details:   http://hg.videolan.org/x265/rev/91fe66f971d2
branches:  
changeset: 5533:91fe66f971d2
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 20:59:01 2013 +0800
description:
asm: assembly code for IntraPredAng4x4 Mode 18
Subject: [x265] cleanup:merge Intra Pred DC mode into intra_pred[]

details:   http://hg.videolan.org/x265/rev/7febdbc37965
branches:  
changeset: 5534:7febdbc37965
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 22:09:17 2013 +0800
description:
cleanup:merge Intra Pred DC mode into intra_pred[]
Subject: [x265] improvement by remove reduce ADD instruction in intra_pred_dc16

details:   http://hg.videolan.org/x265/rev/c9a67d02ad1c
branches:  
changeset: 5535:c9a67d02ad1c
user:      Min Chen <chenm003 at 163.com>
date:      Thu Dec 05 22:13:36 2013 +0800
description:
improvement by remove reduce ADD instruction in intra_pred_dc16
Subject: [x265] asm: primitives of sse_ss for 12x16, 24x32, 48x64 and 64xN blocks

details:   http://hg.videolan.org/x265/rev/4c9b7eb235a9
branches:  
changeset: 5536:4c9b7eb235a9
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 17:50:30 2013 +0550
description:
asm: primitives of sse_ss for 12x16, 24x32, 48x64 and 64xN blocks
Subject: [x265] asm: 16bpp support for sad_x3 - all block sizes

details:   http://hg.videolan.org/x265/rev/8f3af42f7f44
branches:  
changeset: 5537:8f3af42f7f44
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Dec 05 19:25:35 2013 +0550
description:
asm: 16bpp support for sad_x3 - all block sizes
Subject: [x265] asm: 10bpp code for pixel_sub_2xN

details:   http://hg.videolan.org/x265/rev/c36134873a8d
branches:  
changeset: 5538:c36134873a8d
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 18:28:08 2013 +0550
description:
asm: 10bpp code for pixel_sub_2xN
Subject: [x265] asm: 10bpp code for pixel_sub_4xN

details:   http://hg.videolan.org/x265/rev/31b3bf1246c7
branches:  
changeset: 5539:31b3bf1246c7
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 19:21:32 2013 +0550
description:
asm: 10bpp code for pixel_sub_4xN
Subject: [x265] asm: 10bpp code for pixel_sub_6x8

details:   http://hg.videolan.org/x265/rev/c83d6906f665
branches:  
changeset: 5540:c83d6906f665
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 20:32:49 2013 +0550
description:
asm: 10bpp code for pixel_sub_6x8
Subject: [x265] asm: 16bpp support for sad_x4 - all block sizes

details:   http://hg.videolan.org/x265/rev/f864064737bc
branches:  
changeset: 5541:f864064737bc
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Dec 05 21:01:35 2013 +0550
description:
asm: 16bpp support for sad_x4 - all block sizes
Subject: [x265] asm: 10bpp code for pixel_sub_8xN

details:   http://hg.videolan.org/x265/rev/832d1d134449
branches:  
changeset: 5542:832d1d134449
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 21:38:10 2013 +0550
description:
asm: 10bpp code for pixel_sub_8xN
Subject: [x265] asm: 10bpp code for pixel_sub_12x16

details:   http://hg.videolan.org/x265/rev/9d974915023f
branches:  
changeset: 5543:9d974915023f
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Dec 05 22:42:47 2013 +0550
description:
asm: 10bpp code for pixel_sub_12x16
Subject: [x265] all_angs_pred_4x4, asm code for all modes

details:   http://hg.videolan.org/x265/rev/6d1b07d41cdd
branches:  
changeset: 5544:6d1b07d41cdd
user:      Praveen Tiwari <praveen at multicorewareinc.com>
date:      Thu Dec 05 14:21:23 2013 +0550
description:
all_angs_pred_4x4, asm code for all modes
Subject: [x265] Merge

details:   http://hg.videolan.org/x265/rev/67d755e2a30c
branches:  
changeset: 5545:67d755e2a30c
user:      Steve Borho <steve at borho.org>
date:      Thu Dec 05 13:51:59 2013 -0600
description:
Merge

diffstat:

 source/Lib/TLibCommon/TComPrediction.cpp |    12 +-
 source/Lib/TLibEncoder/TEncSearch.cpp    |     2 +-
 source/common/intrapred.cpp              |    30 +-
 source/common/ipfilter.cpp               |    11 +-
 source/common/primitives.cpp             |     9 +-
 source/common/primitives.h               |    13 +-
 source/common/vec/intra-ssse3.cpp        |    60 +-
 source/common/x86/asm-primitives.cpp     |   190 ++-
 source/common/x86/intrapred.h            |    43 +-
 source/common/x86/intrapred8.asm         |  1339 +++++++++++++++++++++++++----
 source/common/x86/ipfilter8.asm          |   316 +++---
 source/common/x86/ipfilter8.h            |     4 +-
 source/common/x86/pixel-util.h           |     1 +
 source/common/x86/pixel-util8.asm        |   861 ++++++++++++------
 source/common/x86/sad16-a.asm            |   102 +-
 source/encoder/compress.cpp              |   148 ++-
 source/encoder/encoder.cpp               |     5 +-
 source/encoder/slicetype.cpp             |     2 +-
 source/test/intrapredharness.cpp         |    58 +-
 source/test/intrapredharness.h           |     4 +-
 source/test/ipfilterharness.cpp          |    73 +-
 source/test/ipfilterharness.h            |     2 +
 22 files changed, 2393 insertions(+), 892 deletions(-)

diffs (truncated from 4515 to 300 lines):

diff -r ee8f2fa7d82a -r 67d755e2a30c source/Lib/TLibCommon/TComPrediction.cpp

--- a/source/Lib/TLibCommon/TComPrediction.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -161,13 +161,9 @@ void TComPrediction::predIntraLumaAng(ui
     {
         primitives.intra_pred_planar[log2BlkSize - 2](refAbv + 1, refLft + 1, dst, stride);
     }
-    else if (dirMode == DC_IDX)
-    {
-        primitives.intra_pred_dc[log2BlkSize - 2](refAbv + 1, refLft + 1, dst, stride, bFilter);
-    }
     else
     {
-        primitives.intra_pred_ang[log2BlkSize - 2](dst, stride, refLft, refAbv, dirMode, bFilter);
+        primitives.intra_pred[log2BlkSize - 2][dirMode](dst, stride, refLft, refAbv, dirMode, bFilter);
     }
 }
 
@@ -192,13 +188,9 @@ void TComPrediction::predIntraChromaAng(
     {
         primitives.intra_pred_planar[log2BlkSize](refAbv + width - 1 + 1, refLft + width - 1 + 1, dst, stride);
     }
-    else if (dirMode == DC_IDX)
-    {
-        primitives.intra_pred_dc[log2BlkSize](refAbv + width - 1 + 1, refLft + width - 1 + 1, dst, stride, false);
-    }
     else
     {
-        primitives.intra_pred_ang[log2BlkSize](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
+        primitives.intra_pred[log2BlkSize][dirMode](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
     }
 }
 
diff -r ee8f2fa7d82a -r 67d755e2a30c source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -1621,7 +1621,7 @@ void TEncSearch::estIntraPredQT(TComData
             pixelcmp_t sa8d = primitives.sa8d[log2SizeMinus2];
 
             // DC
-            primitives.intra_pred_dc[log2SizeMinus2](above + 1, left + 1, tmp, scaleStride, (scaleWidth <= 16));
+            primitives.intra_pred[log2SizeMinus2][DC_IDX](tmp, scaleStride, left, above, 0, (scaleWidth <= 16));
             modeCosts[DC_IDX] = costMultiplier * sa8d(fenc, scaleStride, tmp, scaleStride);
 
             Pel *abovePlanar   = above;
diff -r ee8f2fa7d82a -r 67d755e2a30c source/common/intrapred.cpp
--- a/source/common/intrapred.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/common/intrapred.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -81,11 +81,11 @@ void dcPredFilter(pixel* above, pixel* l
 }
 
 template<int width>
-void dc_pred_c(pixel* above, pixel* left, pixel* dst, intptr_t dstStride, int bFilter)
+void intra_pred_dc_c(pixel* dst, intptr_t dstStride, pixel* left, pixel* above, int /*dirMode*/, int bFilter)
 {
     int k, l;
 
-    pixel dcval = dcPredValue(above, left, width);
+    pixel dcval = dcPredValue(above+1, left+1, width);
 
     for (k = 0; k < width; k++)
     {
@@ -97,7 +97,7 @@ void dc_pred_c(pixel* above, pixel* left
 
     if (bFilter)
     {
-        dcPredFilter(above, left, dst, dstStride, width);
+        dcPredFilter(above+1, left+1, dst, dstStride, width);
     }
 }
 
@@ -293,20 +293,26 @@ namespace x265 {
 
 void Setup_C_IPredPrimitives(EncoderPrimitives& p)
 {
-    p.intra_pred_dc[BLOCK_4x4] = dc_pred_c<4>;
-    p.intra_pred_dc[BLOCK_8x8] = dc_pred_c<8>;
-    p.intra_pred_dc[BLOCK_16x16] = dc_pred_c<16>;
-    p.intra_pred_dc[BLOCK_32x32] = dc_pred_c<32>;
-
     p.intra_pred_planar[BLOCK_4x4] = planad_pred_c<4>;
     p.intra_pred_planar[BLOCK_8x8] = planad_pred_c<8>;
     p.intra_pred_planar[BLOCK_16x16] = planad_pred_c<16>;
     p.intra_pred_planar[BLOCK_32x32] = planad_pred_c<32>;
 
-    p.intra_pred_ang[BLOCK_4x4] = intra_pred_ang_c<4>;
-    p.intra_pred_ang[BLOCK_8x8] = intra_pred_ang_c<8>;
-    p.intra_pred_ang[BLOCK_16x16] = intra_pred_ang_c<16>;
-    p.intra_pred_ang[BLOCK_32x32] = intra_pred_ang_c<32>;
+    // TODO: Fill Planar mode
+    p.intra_pred[BLOCK_4x4][0] = NULL;
+
+    // Intra Prediction DC
+    p.intra_pred[BLOCK_4x4][1] = intra_pred_dc_c<4>;
+    p.intra_pred[BLOCK_8x8][1] = intra_pred_dc_c<8>;
+    p.intra_pred[BLOCK_16x16][1] = intra_pred_dc_c<16>;
+    p.intra_pred[BLOCK_32x32][1] = intra_pred_dc_c<32>;
+    for (int i = 2; i < NUM_INTRA_MODE - 1; i++)
+    {
+        p.intra_pred[BLOCK_4x4][i] = intra_pred_ang_c<4>;
+        p.intra_pred[BLOCK_8x8][i] = intra_pred_ang_c<8>;
+        p.intra_pred[BLOCK_16x16][i] = intra_pred_ang_c<16>;
+        p.intra_pred[BLOCK_32x32][i] = intra_pred_ang_c<32>;
+    }
 
     p.intra_pred_allangs[BLOCK_4x4] = all_angs_pred_c<4>;
     p.intra_pred_allangs[BLOCK_8x8] = all_angs_pred_c<8>;
diff -r ee8f2fa7d82a -r 67d755e2a30c source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/common/ipfilter.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -270,17 +270,24 @@ void interp_horiz_pp_c(pixel *src, intpt
 }
 
 template<int N, int width, int height>
-void interp_horiz_ps_c(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx)
+void interp_horiz_ps_c(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, int isRowExt)
 {
     int16_t const * coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
     int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
     int shift = IF_FILTER_PREC - headRoom;
     int offset = -IF_INTERNAL_OFFS << shift;
+    int blkheight = height;
 
     src -= N / 2 - 1;
 
+    if (isRowExt)
+    {
+        src -= (N / 2 - 1) * srcStride;
+        blkheight += N - 1;
+    }
+
     int row, col;
-    for (row = 0; row < height; row++)
+    for (row = 0; row < blkheight; row++)
     {
         for (col = 0; col < width; col++)
         {
diff -r ee8f2fa7d82a -r 67d755e2a30c source/common/primitives.cpp
--- a/source/common/primitives.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/common/primitives.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -130,10 +130,11 @@ void x265_setup_primitives(x265_param *p
     Setup_Assembly_Primitives(primitives, cpuid);
 #endif
 
-    primitives.sa8d_inter[LUMA_8x8] = primitives.sa8d[BLOCK_8x8];
-    primitives.sa8d_inter[LUMA_16x16] = primitives.sa8d[BLOCK_16x16];
-    primitives.sa8d_inter[LUMA_32x32] = primitives.sa8d[BLOCK_32x32];
-    primitives.sa8d_inter[LUMA_64x64] = primitives.sa8d[BLOCK_64x64];
+    primitives.sa8d[BLOCK_4x4] = primitives.sa8d_inter[LUMA_4x4];
+    primitives.sa8d[BLOCK_8x8] = primitives.sa8d_inter[LUMA_8x8];
+    primitives.sa8d[BLOCK_16x16] = primitives.sa8d_inter[LUMA_16x16];
+    primitives.sa8d[BLOCK_32x32] = primitives.sa8d_inter[LUMA_32x32];
+    primitives.sa8d[BLOCK_64x64] = primitives.sa8d_inter[LUMA_64x64];
 
     // SA8D devolves to SATD for blocks not even multiples of 8x8
     primitives.sa8d_inter[LUMA_4x4]   = primitives.satd[LUMA_4x4];
diff -r ee8f2fa7d82a -r 67d755e2a30c source/common/primitives.h
--- a/source/common/primitives.h	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/common/primitives.h	Thu Dec 05 13:51:59 2013 -0600
@@ -35,6 +35,8 @@
 
 #define FENC_STRIDE 64
 
+#define NUM_INTRA_MODE 36   // copy from CommonDef.h
+
 #if defined(__GNUC__)
 #define ALIGN_VAR_8(T, var)  T var __attribute__((aligned(8)))
 #define ALIGN_VAR_16(T, var) T var __attribute__((aligned(16)))
@@ -161,9 +163,8 @@ typedef void (*pixeladd_ss_t)(int bx, in
 typedef void (*pixelavg_pp_t)(pixel *dst, intptr_t dstride, pixel *src0, intptr_t sstride0, pixel *src1, intptr_t sstride1, int weight);
 typedef void (*blockfill_s_t)(int16_t *dst, intptr_t dstride, int16_t val);
 
-typedef void (*intra_dc_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride, int bFilter);
 typedef void (*intra_planar_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-typedef void (*intra_ang_t)(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int dirMode, int bFilter);
+typedef void (*intra_pred_t)(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int dirMode, int bFilter);
 typedef void (*intra_allangs_t)(pixel *dst, pixel *above0, pixel *left0, pixel *above1, pixel *left1, bool bLuma);
 
 typedef void (*cvt16to32_shl_t)(int32_t *dst, int16_t *src, intptr_t, int, int);
@@ -190,6 +191,7 @@ typedef uint64_t (*var_t)(pixel *pix, in
 typedef void (*plane_copy_deinterleave_t)(pixel *dstu, intptr_t dstuStride, pixel *dstv, intptr_t dstvStride, pixel *src,  intptr_t srcStride, int w, int h);
 
 typedef void (*filter_pp_t) (pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx);
+typedef void (*filter_hps_t) (pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, int isRowExt);
 typedef void (*filter_ps_t) (pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx);
 typedef void (*filter_sp_t) (int16_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx);
 typedef void (*filter_ss_t) (int16_t *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx);
@@ -231,7 +233,7 @@ struct EncoderPrimitives
     pixel_add_ps_t  luma_add_ps[NUM_LUMA_PARTITIONS];
 
     filter_pp_t     luma_hpp[NUM_LUMA_PARTITIONS];
-    filter_ps_t     luma_hps[NUM_LUMA_PARTITIONS];
+    filter_hps_t    luma_hps[NUM_LUMA_PARTITIONS];
     filter_pp_t     luma_vpp[NUM_LUMA_PARTITIONS];
     filter_ps_t     luma_vps[NUM_LUMA_PARTITIONS];
     filter_sp_t     luma_vsp[NUM_LUMA_PARTITIONS];
@@ -248,9 +250,8 @@ struct EncoderPrimitives
     pixeladd_ss_t   pixeladd_ss;
     pixelavg_pp_t   pixelavg_pp[NUM_LUMA_PARTITIONS];
 
-    intra_dc_t      intra_pred_dc[NUM_SQUARE_BLOCKS];
     intra_planar_t  intra_pred_planar[NUM_SQUARE_BLOCKS];
-    intra_ang_t     intra_pred_ang[NUM_SQUARE_BLOCKS];
+    intra_pred_t    intra_pred[NUM_SQUARE_BLOCKS - 1][NUM_INTRA_MODE - 1];  // No 64x64 and DM mode
     intra_allangs_t intra_pred_allangs[NUM_SQUARE_BLOCKS];
     scale_t         scale1D_128to64;
     scale_t         scale2D_64to32;
@@ -280,7 +281,7 @@ struct EncoderPrimitives
         filter_sp_t     filter_vsp[NUM_LUMA_PARTITIONS];
         filter_ss_t     filter_vss[NUM_LUMA_PARTITIONS];
         filter_pp_t     filter_hpp[NUM_LUMA_PARTITIONS];
-        filter_ps_t     filter_hps[NUM_LUMA_PARTITIONS];
+        filter_hps_t    filter_hps[NUM_LUMA_PARTITIONS];
         copy_pp_t       copy_pp[NUM_LUMA_PARTITIONS];
         copy_sp_t       copy_sp[NUM_LUMA_PARTITIONS];
         copy_ps_t       copy_ps[NUM_LUMA_PARTITIONS];
diff -r ee8f2fa7d82a -r 67d755e2a30c source/common/vec/intra-ssse3.cpp
--- a/source/common/vec/intra-ssse3.cpp	Thu Dec 05 00:53:59 2013 -0600
+++ b/source/common/vec/intra-ssse3.cpp	Thu Dec 05 13:51:59 2013 -0600
@@ -33,49 +33,6 @@
 
 using namespace x265;
 
-// NOTE: I will remove below wrapper code after all of IntraAng mode finished
-extern "C" {
-#include "x86/intrapred.h"
-}
-intra_ang_t intra_ang4[NUM_INTRA_MODE - 1] =
-{
-    NULL,                               // Mode 0
-    NULL,                               // Mode 1
-    x265_intra_pred_ang4_2_ssse3,       // Mode 2
-    x265_intra_pred_ang4_3_ssse3,       // Mode 3
-    x265_intra_pred_ang4_4_ssse3,       // Mode 4
-    x265_intra_pred_ang4_5_ssse3,       // Mode 5
-    x265_intra_pred_ang4_6_ssse3,       // Mode 6
-    x265_intra_pred_ang4_7_ssse3,       // Mode 7
-    x265_intra_pred_ang4_8_ssse3,       // Mode 8
-    x265_intra_pred_ang4_9_ssse3,       // Mode 9
-    x265_intra_pred_ang4_10_ssse3,      // Mode 10
-    NULL,                               // Mode 11
-    NULL,                               // Mode 12
-    NULL,                               // Mode 13
-    NULL,                               // Mode 14
-    NULL,                               // Mode 15
-    NULL,                               // Mode 16
-    NULL,                               // Mode 17
-    NULL,                               // Mode 18
-    NULL,                               // Mode 19
-    NULL,                               // Mode 20
-    NULL,                               // Mode 21
-    NULL,                               // Mode 22
-    NULL,                               // Mode 23
-    NULL,                               // Mode 24
-    NULL,                               // Mode 25
-    x265_intra_pred_ang4_26_ssse3,      // Mode 26
-    x265_intra_pred_ang4_9_ssse3,       // Mode 27
-    x265_intra_pred_ang4_8_ssse3,       // Mode 28
-    x265_intra_pred_ang4_7_ssse3,       // Mode 29
-    x265_intra_pred_ang4_6_ssse3,       // Mode 30
-    x265_intra_pred_ang4_5_ssse3,       // Mode 31
-    x265_intra_pred_ang4_4_ssse3,       // Mode 32
-    x265_intra_pred_ang4_3_ssse3,       // Mode 33
-    x265_intra_pred_ang4_2_ssse3,       // Mode 34
-};
-
 namespace {
 #if !HIGH_BIT_DEPTH
 const int angAP[17][64] =
@@ -708,12 +665,6 @@ void intraPredAng4x4(pixel* dst, intptr_
 {
     assert(dirMode > 1); //no planar and dc
 
-    if (intra_ang4[dirMode])
-    {
-        intra_ang4[dirMode](dst, dstStride, refLeft, refAbove, dirMode, bFilter);
-        return;
-    }
-
     static const int mode_to_angle_table[] = { 32, 26, 21, 17, 13, 9, 5, 2, 0, -2, -5, -9, -13, -17, -21, -26, -32, -26, -21, -17, -13, -9, -5, -2, 0, 2, 5, 9, 13, 17, 21, 26, 32 };
     static const int mode_to_invAng_table[] = { 256, 315, 390, 482, 630, 910, 1638, 4096, 0, 4096, 1638, 910, 630, 482, 390, 315, 256, 315, 390, 482, 630, 910, 1638, 4096, 0, 4096, 1638, 910, 630, 482, 390, 315, 256 };
     int intraPredAngle = mode_to_angle_table[dirMode - 2];
@@ -3243,10 +3194,13 @@ namespace x265 {
 void Setup_Vec_IPredPrimitives_ssse3(EncoderPrimitives& p)
 {
 #if !HIGH_BIT_DEPTH
-    p.intra_pred_ang[BLOCK_4x4] = intraPredAng4x4;
-    p.intra_pred_ang[BLOCK_8x8] = intraPredAng8x8;
-    p.intra_pred_ang[BLOCK_16x16] = intraPredAng16x16;
-    p.intra_pred_ang[BLOCK_32x32] = intraPredAng32x32;
+    for (int i = 2; i < NUM_INTRA_MODE - 1; i++)
+    {
+        p.intra_pred[BLOCK_4x4][i] = intraPredAng4x4;
+        p.intra_pred[BLOCK_8x8][i] = intraPredAng8x8;
+        p.intra_pred[BLOCK_16x16][i] = intraPredAng16x16;
+        p.intra_pred[BLOCK_32x32][i] = intraPredAng32x32;
+    }
 #endif
 }
 }