[x265-commits] [x265] compress: disable EARLY_EXIT and TOP_SKIP (temporarily)

Sun Dec 1 21:20:21 CET 2013

details:   http://hg.videolan.org/x265/rev/e0036ec4a61b
branches:  
changeset: 5369:e0036ec4a61b
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 13:34:21 2013 +0530
description:
compress: disable EARLY_EXIT and TOP_SKIP (temporarily)
Subject: [x265] compress: save best bits, sad in xcomputeCostIntrainInter

details:   http://hg.videolan.org/x265/rev/a7d2fb189311
branches:  
changeset: 5370:a7d2fb189311
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 13:53:27 2013 +0530
description:
compress: save best bits, sad in xcomputeCostIntrainInter
Subject: [x265] compress: save distortion info in xComputeCostInter.

details:   http://hg.videolan.org/x265/rev/2559b4c52148
branches:  
changeset: 5371:2559b4c52148
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 14:35:14 2013 +0530
description:
compress: save distortion info in xComputeCostInter.
Subject: [x265] compress: cleanup

details:   http://hg.videolan.org/x265/rev/ac01f12310ed
branches:  
changeset: 5372:ac01f12310ed
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 14:40:18 2013 +0530
description:
compress: cleanup
Subject: [x265] presets: correct bframes in "slow" to 4

details:   http://hg.videolan.org/x265/rev/fb93582b5f3f
branches:  stable
changeset: 5373:fb93582b5f3f
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 16:29:02 2013 +0530
description:
presets: correct bframes in "slow" to 4
Subject: [x265] Merge from stable

details:   http://hg.videolan.org/x265/rev/833d78aaf71e
branches:  
changeset: 5374:833d78aaf71e
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Nov 29 16:40:42 2013 +0530
description:
Merge from stable
Subject: [x265] ssim: increase precision in ssim reporting

details:   http://hg.videolan.org/x265/rev/b08f3853adb9
branches:  stable
changeset: 5375:b08f3853adb9
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Sat Nov 30 16:09:59 2013 +0530
description:
ssim: increase precision in ssim reporting
Subject: [x265] presets: bpyramid default value reset to 1

details:   http://hg.videolan.org/x265/rev/87dc694fc016
branches:  stable
changeset: 5376:87dc694fc016
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Sat Nov 30 16:14:00 2013 +0530
description:
presets: bpyramid default value reset to 1

No support for strict b-pyramid yet.
Subject: [x265] Merge from stable

details:   http://hg.videolan.org/x265/rev/2786f9e92560
branches:  
changeset: 5377:2786f9e92560
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Sat Nov 30 16:15:58 2013 +0530
description:
Merge from stable
Subject: [x265] asm: assembly code for intra_pred_planar[32x32]

details:   http://hg.videolan.org/x265/rev/e6a32d404e18
branches:  
changeset: 5378:e6a32d404e18
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Nov 28 13:07:20 2013 +0530
description:
asm: assembly code for intra_pred_planar[32x32]
Subject: [x265] rename to avoid 10bpp conflict

details:   http://hg.videolan.org/x265/rev/016709ae6264
branches:  
changeset: 5379:016709ae6264
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 28 18:23:12 2013 +0800
description:
rename to avoid 10bpp conflict
Subject: [x265] asm: code for pixel_sse_sp_12x16

details:   http://hg.videolan.org/x265/rev/8683adc61bec
branches:  
changeset: 5380:8683adc61bec
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Nov 28 17:28:39 2013 +0550
description:
asm: code for pixel_sse_sp_12x16
Subject: [x265] asm: code for pixel_sse_sp_4xN

details:   http://hg.videolan.org/x265/rev/052a1b094def
branches:  
changeset: 5381:052a1b094def
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Nov 28 17:27:10 2013 +0550
description:
asm: code for pixel_sse_sp_4xN
Subject: [x265] asm: cleanups for pixel_sse_sp

details:   http://hg.videolan.org/x265/rev/8a9a0ef760e8
branches:  
changeset: 5382:8a9a0ef760e8
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Nov 28 17:47:56 2013 +0550
description:
asm: cleanups for pixel_sse_sp
Subject: [x265] pixel: remove sse_sp intrinsic primitives, we have asm coverage

details:   http://hg.videolan.org/x265/rev/5857fdc3c3ff
branches:  
changeset: 5383:5857fdc3c3ff
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 10:34:14 2013 -0600
description:
pixel: remove sse_sp intrinsic primitives, we have asm coverage
Subject: [x265] testbench: added cvt16to32_shl primitive function

details:   http://hg.videolan.org/x265/rev/f9935384fa2a
branches:  
changeset: 5384:f9935384fa2a
user:      Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date:      Thu Nov 28 19:25:11 2013 +0550
description:
testbench: added cvt16to32_shl primitive function
Subject: [x265] cleanup: remove unused cvt16to16_shl_t

details:   http://hg.videolan.org/x265/rev/8f1a72797abb
branches:  
changeset: 5385:8f1a72797abb
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 28 21:14:17 2013 +0800
description:
cleanup: remove unused cvt16to16_shl_t
Subject: [x265] asm: assembly code for cvt16to32_shl

details:   http://hg.videolan.org/x265/rev/9bda4cecf6c0
branches:  
changeset: 5386:9bda4cecf6c0
user:      Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date:      Thu Nov 28 21:07:46 2013 +0550
description:
asm: assembly code for cvt16to32_shl
Subject: [x265] asm : Adding asm routine for dst4.

details:   http://hg.videolan.org/x265/rev/2ab09fab2826
branches:  
changeset: 5387:2ab09fab2826
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Thu Nov 28 20:21:02 2013 +0550
description:
asm : Adding asm routine for dst4.
Subject: [x265] asm: enabled asm routines for HIGH_BIT_DEPTH, which has the support for 16bpp

details:   http://hg.videolan.org/x265/rev/bb776ea49cba
branches:  
changeset: 5388:bb776ea49cba
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Nov 28 21:47:23 2013 +0550
description:
asm: enabled asm routines for HIGH_BIT_DEPTH, which has the support for 16bpp
Subject: [x265] size based array for intra_pred_ang[]

details:   http://hg.videolan.org/x265/rev/cc7bb2f18d01
branches:  
changeset: 5389:cc7bb2f18d01
user:      Min Chen <chenm003 at 163.com>
date:      Fri Nov 29 17:16:08 2013 +0800
description:
size based array for intra_pred_ang[]
Subject: [x265] asm : Adding asm routine for idst4

details:   http://hg.videolan.org/x265/rev/3e8c280b16a6
branches:  
changeset: 5390:3e8c280b16a6
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Fri Nov 29 20:27:24 2013 +0550
description:
asm : Adding asm routine for idst4
Subject: [x265] Enable idst4 asm

details:   http://hg.videolan.org/x265/rev/d8c523bd9f90
branches:  
changeset: 5391:d8c523bd9f90
user:      Nabajit Deka <nabajit at multicorewareinc.com>
date:      Fri Nov 29 20:30:15 2013 +0550
description:
Enable idst4 asm
Subject: [x265] 10bpp: asm code for pixel_var_32x32 and 64x64

details:   http://hg.videolan.org/x265/rev/803048f62317
branches:  
changeset: 5392:803048f62317
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Fri Nov 29 21:33:25 2013 +0550
description:
10bpp: asm code for pixel_var_32x32 and 64x64
Subject: [x265] asm: plumb out more 16bpp asm setup infrastructure

details:   http://hg.videolan.org/x265/rev/776fc3575e2d
branches:  
changeset: 5393:776fc3575e2d
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 10:52:23 2013 -0600
description:
asm: plumb out more 16bpp asm setup infrastructure
Subject: [x265] intrapred: use square block defines, do not instantiate intra 64x64

details:   http://hg.videolan.org/x265/rev/3409078021ac
branches:  
changeset: 5394:3409078021ac
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 12:14:54 2013 -0600
description:
intrapred: use square block defines, do not instantiate intra 64x64
Subject: [x265] intrapred: fix func decl of intra-ang C ref

details:   http://hg.videolan.org/x265/rev/50261fa292ad
branches:  
changeset: 5395:50261fa292ad
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 12:17:05 2013 -0600
description:
intrapred: fix func decl of intra-ang C ref
Subject: [x265] vec: remove two DCT intrinsic primitives with asm coverage

details:   http://hg.videolan.org/x265/rev/9facac4f81f7
branches:  
changeset: 5396:9facac4f81f7
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 13:16:31 2013 -0600
description:
vec: remove two DCT intrinsic primitives with asm coverage
Subject: [x265] intra: testbench fixups after dropping 64x64 C refs

details:   http://hg.videolan.org/x265/rev/81c09b55acf1
branches:  
changeset: 5397:81c09b55acf1
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 14:14:15 2013 -0600
description:
intra: testbench fixups after dropping 64x64 C refs
Subject: [x265] vec: drop intra planar intrinsic primitives, we have asm coverage

details:   http://hg.videolan.org/x265/rev/343d9ba487b2
branches:  
changeset: 5398:343d9ba487b2
user:      Steve Borho <steve at borho.org>
date:      Sun Dec 01 14:16:46 2013 -0600
description:
vec: drop intra planar intrinsic primitives, we have asm coverage

diffstat:

 source/Lib/TLibCommon/CommonDef.h        |    3 -
 source/Lib/TLibCommon/TComPrediction.cpp |    8 +-
 source/Lib/TLibCommon/TComPrediction.h   |    4 +-
 source/common/CMakeLists.txt             |    4 +-
 source/common/common.cpp                 |   10 +-
 source/common/intrapred.cpp              |   21 +-
 source/common/primitives.h               |    5 +-
 source/common/vec/dct-sse3.cpp           |   91 ---
 source/common/vec/dct-sse41.cpp          |  114 ---
 source/common/vec/dct-ssse3.cpp          |   68 --
 source/common/vec/intra-sse41.cpp        |  206 ------
 source/common/vec/intra-ssse3.cpp        |   39 +-
 source/common/vec/pixel-sse41.cpp        |  371 ------------
 source/common/x86/asm-primitives.cpp     |  107 ++-
 source/common/x86/blockcopy8.h           |    1 +
 source/common/x86/dct8.asm               |  212 +++++++
 source/common/x86/dct8.h                 |    2 +
 source/common/x86/intrapred.asm          |  566 -------------------
 source/common/x86/intrapred.h            |    1 +
 source/common/x86/intrapred8.asm         |  676 ++++++++++++++++++++++
 source/common/x86/pixel-a.asm            |  262 ++++++++-
 source/common/x86/pixel-util.asm         |  883 -----------------------------
 source/common/x86/pixel-util8.asm        |  922 +++++++++++++++++++++++++++++++
 source/common/x86/pixel.h                |   26 +-
 source/encoder/compress.cpp              |   31 +-
 source/encoder/encoder.cpp               |    2 +-
 source/test/intrapredharness.cpp         |   66 +-
 source/test/intrapredharness.h           |    2 +-
 source/test/pixelharness.cpp             |   37 +
 source/test/pixelharness.h               |    1 +
 30 files changed, 2263 insertions(+), 2478 deletions(-)

diffs (truncated from 5423 to 300 lines):

diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/CommonDef.h

--- a/source/Lib/TLibCommon/CommonDef.h	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/CommonDef.h	Sun Dec 01 14:16:46 2013 -0600
@@ -99,9 +99,6 @@
 #define FAST_UDI_MAX_RDMODE_NUM     35 ///< maximum number of RD comparison in fast-UDI estimation loop
 
 #define NUM_INTRA_MODE 36
-#if !REMOVE_LM_CHROMA
-#define LM_CHROMA_IDX  35
-#endif
 
 #define PLANAR_IDX                  0
 #define VER_IDX                     26 // index for intra VERTICAL   mode
diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/TComPrediction.cpp
--- a/source/Lib/TLibCommon/TComPrediction.cpp	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp	Sun Dec 01 14:16:46 2013 -0600
@@ -125,7 +125,7 @@ void TComPrediction::initTempBuff(int cs
 // Public member functions
 // ====================================================================================================================
 
-void TComPrediction::predIntraLumaAng(uint32_t dirMode, Pel* dst, uint32_t stride, int size)
+void TComPrediction::predIntraLumaAng(uint32_t dirMode, Pel* dst, intptr_t stride, int size)
 {
     assert(g_convertToBit[size] >= 0);   //   4x  4
     assert(g_convertToBit[size] <= 5);   // 128x128
@@ -168,12 +168,12 @@ void TComPrediction::predIntraLumaAng(ui
     }
     else
     {
-        primitives.intra_pred_ang(dst, stride, size, dirMode, bFilter, refLft, refAbv);
+        primitives.intra_pred_ang[log2BlkSize - 2](dst, stride, refLft, refAbv, dirMode, bFilter);
     }
 }
 
 // Angular chroma
-void TComPrediction::predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* dst, uint32_t stride, int width)
+void TComPrediction::predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* dst, intptr_t stride, int width)
 {
     int log2BlkSize = g_convertToBit[width];
 
@@ -199,7 +199,7 @@ void TComPrediction::predIntraChromaAng(
     }
     else
     {
-        primitives.intra_pred_ang(dst, stride, width, dirMode, false, refLft + width - 1, refAbv + width - 1);
+        primitives.intra_pred_ang[log2BlkSize](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
     }
 }
 
diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/TComPrediction.h
--- a/source/Lib/TLibCommon/TComPrediction.h	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.h	Sun Dec 01 14:16:46 2013 -0600
@@ -108,8 +108,8 @@ public:
     void getMvPredAMVP(TComDataCU* cu, uint32_t partIdx, uint32_t partAddr, int picList, MV& mvPred);
 
     // Angular Intra
-    void predIntraLumaAng(uint32_t dirMode, Pel* pred, uint32_t stride, int width);
-    void predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* pred, uint32_t stride, int width);
+    void predIntraLumaAng(uint32_t dirMode, Pel* pred, intptr_t stride, int width);
+    void predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* pred, intptr_t stride, int width);
 
     Pel* getPredicBuf()             { return m_predBuf; }
 
diff -r e7a5780843de -r 343d9ba487b2 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/CMakeLists.txt	Sun Dec 01 14:16:46 2013 -0600
@@ -113,8 +113,8 @@ endif(ENABLE_PRIMITIVES_VEC)
 
 if(ENABLE_PRIMITIVES_ASM)
     set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h)
-    set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm ipfilter8.asm pixel-util.asm
-               blockcopy8.asm intrapred.asm pixeladd8.asm dct8.asm)
+    set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm ipfilter8.asm pixel-util8.asm
+               blockcopy8.asm intrapred8.asm pixeladd8.asm dct8.asm)
     if (NOT X64)
         set(A_SRCS ${A_SRCS} pixel-32.asm)
     endif()
diff -r e7a5780843de -r 343d9ba487b2 source/common/common.cpp
--- a/source/common/common.cpp	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/common.cpp	Sun Dec 01 14:16:46 2013 -0600
@@ -164,7 +164,7 @@ void x265_param_default(x265_param *para
     param->bframes = 4;
     param->lookaheadDepth = 20;
     param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
-    param->bpyramid = 2;
+    param->bpyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
 
     /* Intra Coding Tools */
@@ -278,7 +278,6 @@ int x265_param_default_preset(x265_param
             param->maxCUSize = 32;
             param->searchRange = 28;
             param->bFrameAdaptive = 0;
-            param->bpyramid = 1;
             param->subpelRefine = 0;
             param->maxNumMergeCand = 2;
             param->searchMethod = X265_DIA_SEARCH;
@@ -298,7 +297,6 @@ int x265_param_default_preset(x265_param
             param->maxCUSize = 32;
             param->searchRange = 44;
             param->bFrameAdaptive = 0;
-            param->bpyramid = 1;
             param->subpelRefine = 1;
             param->bEnableRectInter = 0;
             param->bEnableAMP = 0;
@@ -312,7 +310,6 @@ int x265_param_default_preset(x265_param
             param->lookaheadDepth = 15;
             param->maxCUSize = 32;
             param->bFrameAdaptive = 0;
-            param->bpyramid = 1;
             param->subpelRefine = 1;
             param->bEnableRectInter = 0;
             param->bEnableAMP = 0;
@@ -324,7 +321,6 @@ int x265_param_default_preset(x265_param
         {
             param->lookaheadDepth = 15;
             param->bFrameAdaptive = 0;
-            param->bpyramid = 1;
             param->bEnableRectInter = 0;
             param->bEnableAMP = 0;
             param->bEnableEarlySkip = 1;
@@ -334,7 +330,6 @@ int x265_param_default_preset(x265_param
         else if (!strcmp(preset, "fast"))
         {
             param->lookaheadDepth = 15;
-            param->bpyramid = 1;
             param->bEnableRectInter = 0;
             param->bEnableAMP = 0;
         }
@@ -345,8 +340,7 @@ int x265_param_default_preset(x265_param
         else if (!strcmp(preset, "slow"))
         {
             param->lookaheadDepth = 25;
-            param->bframes = 8;
-            param->bpyramid = 1;
+            param->bframes = 4;
             param->rdLevel = 1;
             param->subpelRefine = 3;
             param->maxNumMergeCand = 3;
diff -r e7a5780843de -r 343d9ba487b2 source/common/intrapred.cpp
--- a/source/common/intrapred.cpp	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/intrapred.cpp	Sun Dec 01 14:16:46 2013 -0600
@@ -146,7 +146,8 @@ void planad_pred_c(pixel* above, pixel* 
     }
 }
 
-void ang_pred_c(pixel* dst, int dstStride, int width, int dirMode, bool bFilter, pixel *refLeft, pixel *refAbove)
+template<int width>
+void intra_pred_ang_c(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int dirMode, int bFilter)
 {
     // Map the mode index to main prediction direction and angle
     int k, l;
@@ -265,7 +266,7 @@ void all_angs_pred_c(pixel *dest, pixel 
         pixel *above = (IntraFilterType[(int)g_convertToBit[size]][mode] ? above1 : above0);
         pixel *out = dest + (mode - 2) * (size * size);
 
-        ang_pred_c(out, size, size, mode, bLuma, left, above);
+        intra_pred_ang_c<size>(out, size, left, above, mode, bLuma);
 
         // Optimize code don't flip buffer
         bool modeHor = (mode < 18);
@@ -301,13 +302,15 @@ void Setup_C_IPredPrimitives(EncoderPrim
     p.intra_pred_planar[BLOCK_8x8] = planad_pred_c<8>;
     p.intra_pred_planar[BLOCK_16x16] = planad_pred_c<16>;
     p.intra_pred_planar[BLOCK_32x32] = planad_pred_c<32>;
-    p.intra_pred_planar[BLOCK_64x64] = planad_pred_c<64>;
 
-    p.intra_pred_ang = ang_pred_c;
-    p.intra_pred_allangs[0] = all_angs_pred_c<4>;
-    p.intra_pred_allangs[1] = all_angs_pred_c<8>;
-    p.intra_pred_allangs[2] = all_angs_pred_c<16>;
-    p.intra_pred_allangs[3] = all_angs_pred_c<32>;
-    p.intra_pred_allangs[4] = all_angs_pred_c<64>;
+    p.intra_pred_ang[BLOCK_4x4] = intra_pred_ang_c<4>;
+    p.intra_pred_ang[BLOCK_8x8] = intra_pred_ang_c<8>;
+    p.intra_pred_ang[BLOCK_16x16] = intra_pred_ang_c<16>;
+    p.intra_pred_ang[BLOCK_32x32] = intra_pred_ang_c<32>;
+
+    p.intra_pred_allangs[BLOCK_4x4] = all_angs_pred_c<4>;
+    p.intra_pred_allangs[BLOCK_8x8] = all_angs_pred_c<8>;
+    p.intra_pred_allangs[BLOCK_16x16] = all_angs_pred_c<16>;
+    p.intra_pred_allangs[BLOCK_32x32] = all_angs_pred_c<32>;
 }
 }
diff -r e7a5780843de -r 343d9ba487b2 source/common/primitives.h
--- a/source/common/primitives.h	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/primitives.h	Sun Dec 01 14:16:46 2013 -0600
@@ -163,11 +163,10 @@ typedef void (*blockfill_s_t)(int16_t *d
 
 typedef void (*intra_dc_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride, int bFilter);
 typedef void (*intra_planar_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-typedef void (*intra_ang_t)(pixel* dst, int dstStride, int width, int dirMode, bool bFilter, pixel *refLeft, pixel *refAbove);
+typedef void (*intra_ang_t)(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int width, int bFilter);
 typedef void (*intra_allangs_t)(pixel *dst, pixel *above0, pixel *left0, pixel *above1, pixel *left1, bool bLuma);
 
 typedef void (*cvt16to32_shl_t)(int32_t *dst, int16_t *src, intptr_t, int, int);
-typedef void (*cvt16to16_shl_t)(int16_t *dst, int16_t *src, int, int, intptr_t, int);
 typedef void (*cvt32to16_shr_t)(int16_t *dst, int32_t *src, intptr_t, int, int);
 
 typedef void (*dct_t)(int16_t *src, int32_t *dst, intptr_t stride);
@@ -251,7 +250,7 @@ struct EncoderPrimitives
 
     intra_dc_t      intra_pred_dc[NUM_SQUARE_BLOCKS];
     intra_planar_t  intra_pred_planar[NUM_SQUARE_BLOCKS];
-    intra_ang_t     intra_pred_ang;
+    intra_ang_t     intra_pred_ang[NUM_SQUARE_BLOCKS];
     intra_allangs_t intra_pred_allangs[NUM_SQUARE_BLOCKS];
     scale_t         scale1D_128to64;
     scale_t         scale2D_64to32;
diff -r e7a5780843de -r 343d9ba487b2 source/common/vec/dct-sse3.cpp
--- a/source/common/vec/dct-sse3.cpp	Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/vec/dct-sse3.cpp	Sun Dec 01 14:16:46 2013 -0600
@@ -41,97 +41,6 @@ using namespace x265;
 
 namespace {
 #if !HIGH_BIT_DEPTH
-ALIGN_VAR_32(static const int16_t, tab_idct_4x4[4][8]) =
-{
-    { 64,  64, 64,  64, 64,  64, 64,  64 },
-    { 64, -64, 64, -64, 64, -64, 64, -64 },
-    { 83,  36, 83,  36, 83,  36, 83,  36 },
-    { 36, -83, 36, -83, 36, -83, 36, -83 },
-};
-void idct4(int32_t *src, int16_t *dst, intptr_t stride)
-{
-    __m128i S0, S8, m128iAdd, m128Tmp1, m128Tmp2, E1, E2, O1, O2, m128iA, m128iD;
-
-    m128Tmp1 = _mm_load_si128((__m128i*)&src[0]);
-    m128Tmp2 = _mm_load_si128((__m128i*)&src[4]);
-    S0 = _mm_packs_epi32(m128Tmp1, m128Tmp2);
-
-    m128Tmp1 = _mm_load_si128((__m128i*)&src[8]);
-    m128Tmp2 = _mm_load_si128((__m128i*)&src[12]);
-    S8 = _mm_packs_epi32(m128Tmp1, m128Tmp2);
-
-    m128iAdd = _mm_set1_epi32(64);
-
-    m128Tmp1 = _mm_unpacklo_epi16(S0, S8);
-    E1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[0])));
-    E1 = _mm_add_epi32(E1, m128iAdd);
-
-    E2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[1])));
-    E2 = _mm_add_epi32(E2, m128iAdd);
-
-    m128Tmp1 = _mm_unpackhi_epi16(S0, S8);
-    O1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[2])));
-    O2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[3])));
-
-    m128iA  = _mm_add_epi32(E1, O1);
-    m128iA  = _mm_srai_epi32(m128iA, 7);        // sum = sum >> shiftNum
-    m128Tmp1 = _mm_add_epi32(E2, O2);
-    m128Tmp1 = _mm_srai_epi32(m128Tmp1, 7);     // sum = sum >> shiftNum
-    m128iA = _mm_packs_epi32(m128iA, m128Tmp1);
-
-    m128iD = _mm_sub_epi32(E2, O2);
-    m128iD = _mm_srai_epi32(m128iD, 7);         // sum = sum >> shiftNum
-
-    m128Tmp1 = _mm_sub_epi32(E1, O1);
-    m128Tmp1 = _mm_srai_epi32(m128Tmp1, 7);     // sum = sum >> shiftNum
-
-    m128iD = _mm_packs_epi32(m128iD, m128Tmp1);
-
-    S0 = _mm_unpacklo_epi16(m128iA, m128iD);
-    S8 = _mm_unpackhi_epi16(m128iA, m128iD);
-
-    m128iA = _mm_unpacklo_epi16(S0, S8);
-    m128iD = _mm_unpackhi_epi16(S0, S8);
-
-    /*  ##########################  */
-
-    m128iAdd = _mm_set1_epi32(2048);
-    m128Tmp1 = _mm_unpacklo_epi16(m128iA, m128iD);
-    E1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[0])));
-    E1 = _mm_add_epi32(E1, m128iAdd);
-
-    E2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[1])));
-    E2 = _mm_add_epi32(E2, m128iAdd);
-
-    m128Tmp1 = _mm_unpackhi_epi16(m128iA, m128iD);
-    O1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[2])));
-    O2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[3])));
-
-    m128iA   = _mm_add_epi32(E1, O1);
-    m128iA   = _mm_srai_epi32(m128iA, 12);
-    m128Tmp1 = _mm_add_epi32(E2, O2);
-    m128Tmp1 = _mm_srai_epi32(m128Tmp1, 12);
-    m128iA   = _mm_packs_epi32(m128iA, m128Tmp1);
-
-    m128iD = _mm_sub_epi32(E2, O2);
-    m128iD = _mm_srai_epi32(m128iD, 12);
-
-    m128Tmp1 = _mm_sub_epi32(E1, O1);
-    m128Tmp1 = _mm_srai_epi32(m128Tmp1, 12);
-
-    m128iD = _mm_packs_epi32(m128iD, m128Tmp1);
-
-    m128Tmp1 = _mm_unpacklo_epi16(m128iA, m128iD);   // [32 30 22 20 12 10 02 00]
-    m128Tmp2 = _mm_unpackhi_epi16(m128iA, m128iD);   // [33 31 23 21 13 11 03 01]
-    m128iA   = _mm_unpacklo_epi16(m128Tmp1, m128Tmp2);
-    m128iD   = _mm_unpackhi_epi16(m128Tmp1, m128Tmp2);