[x265-commits] [x265] compress: disable EARLY_EXIT and TOP_SKIP (temporarily)
Deepthi Nandakumar
deepthi at multicorewareinc.com
Sun Dec 1 21:20:21 CET 2013
details: http://hg.videolan.org/x265/rev/e0036ec4a61b
branches:
changeset: 5369:e0036ec4a61b
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 13:34:21 2013 +0530
description:
compress: disable EARLY_EXIT and TOP_SKIP (temporarily)
Subject: [x265] compress: save best bits, sad in xcomputeCostIntrainInter
details: http://hg.videolan.org/x265/rev/a7d2fb189311
branches:
changeset: 5370:a7d2fb189311
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 13:53:27 2013 +0530
description:
compress: save best bits, sad in xcomputeCostIntrainInter
Subject: [x265] compress: save distortion info in xComputeCostInter.
details: http://hg.videolan.org/x265/rev/2559b4c52148
branches:
changeset: 5371:2559b4c52148
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 14:35:14 2013 +0530
description:
compress: save distortion info in xComputeCostInter.
Subject: [x265] compress: cleanup
details: http://hg.videolan.org/x265/rev/ac01f12310ed
branches:
changeset: 5372:ac01f12310ed
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 14:40:18 2013 +0530
description:
compress: cleanup
Subject: [x265] presets: correct bframes in "slow" to 4
details: http://hg.videolan.org/x265/rev/fb93582b5f3f
branches: stable
changeset: 5373:fb93582b5f3f
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 16:29:02 2013 +0530
description:
presets: correct bframes in "slow" to 4
Subject: [x265] Merge from stable
details: http://hg.videolan.org/x265/rev/833d78aaf71e
branches:
changeset: 5374:833d78aaf71e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Nov 29 16:40:42 2013 +0530
description:
Merge from stable
Subject: [x265] ssim: increase precision in ssim reporting
details: http://hg.videolan.org/x265/rev/b08f3853adb9
branches: stable
changeset: 5375:b08f3853adb9
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Sat Nov 30 16:09:59 2013 +0530
description:
ssim: increase precision in ssim reporting
Subject: [x265] presets: bpyramid default value reset to 1
details: http://hg.videolan.org/x265/rev/87dc694fc016
branches: stable
changeset: 5376:87dc694fc016
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Sat Nov 30 16:14:00 2013 +0530
description:
presets: bpyramid default value reset to 1
No support for strict b-pyramid yet.
Subject: [x265] Merge from stable
details: http://hg.videolan.org/x265/rev/2786f9e92560
branches:
changeset: 5377:2786f9e92560
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Sat Nov 30 16:15:58 2013 +0530
description:
Merge from stable
Subject: [x265] asm: assembly code for intra_pred_planar[32x32]
details: http://hg.videolan.org/x265/rev/e6a32d404e18
branches:
changeset: 5378:e6a32d404e18
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Nov 28 13:07:20 2013 +0530
description:
asm: assembly code for intra_pred_planar[32x32]
Subject: [x265] rename to avoid 10bpp conflict
details: http://hg.videolan.org/x265/rev/016709ae6264
branches:
changeset: 5379:016709ae6264
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 28 18:23:12 2013 +0800
description:
rename to avoid 10bpp conflict
Subject: [x265] asm: code for pixel_sse_sp_12x16
details: http://hg.videolan.org/x265/rev/8683adc61bec
branches:
changeset: 5380:8683adc61bec
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Thu Nov 28 17:28:39 2013 +0550
description:
asm: code for pixel_sse_sp_12x16
Subject: [x265] asm: code for pixel_sse_sp_4xN
details: http://hg.videolan.org/x265/rev/052a1b094def
branches:
changeset: 5381:052a1b094def
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Thu Nov 28 17:27:10 2013 +0550
description:
asm: code for pixel_sse_sp_4xN
Subject: [x265] asm: cleanups for pixel_sse_sp
details: http://hg.videolan.org/x265/rev/8a9a0ef760e8
branches:
changeset: 5382:8a9a0ef760e8
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Thu Nov 28 17:47:56 2013 +0550
description:
asm: cleanups for pixel_sse_sp
Subject: [x265] pixel: remove sse_sp intrinsic primitives, we have asm coverage
details: http://hg.videolan.org/x265/rev/5857fdc3c3ff
branches:
changeset: 5383:5857fdc3c3ff
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 10:34:14 2013 -0600
description:
pixel: remove sse_sp intrinsic primitives, we have asm coverage
Subject: [x265] testbench: added cvt16to32_shl primitive function
details: http://hg.videolan.org/x265/rev/f9935384fa2a
branches:
changeset: 5384:f9935384fa2a
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Thu Nov 28 19:25:11 2013 +0550
description:
testbench: added cvt16to32_shl primitive function
Subject: [x265] cleanup: remove unused cvt16to16_shl_t
details: http://hg.videolan.org/x265/rev/8f1a72797abb
branches:
changeset: 5385:8f1a72797abb
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 28 21:14:17 2013 +0800
description:
cleanup: remove unused cvt16to16_shl_t
Subject: [x265] asm: assembly code for cvt16to32_shl
details: http://hg.videolan.org/x265/rev/9bda4cecf6c0
branches:
changeset: 5386:9bda4cecf6c0
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Thu Nov 28 21:07:46 2013 +0550
description:
asm: assembly code for cvt16to32_shl
Subject: [x265] asm : Adding asm routine for dst4.
details: http://hg.videolan.org/x265/rev/2ab09fab2826
branches:
changeset: 5387:2ab09fab2826
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Thu Nov 28 20:21:02 2013 +0550
description:
asm : Adding asm routine for dst4.
Subject: [x265] asm: enabled asm routines for HIGH_BIT_DEPTH, which has the support for 16bpp
details: http://hg.videolan.org/x265/rev/bb776ea49cba
branches:
changeset: 5388:bb776ea49cba
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Thu Nov 28 21:47:23 2013 +0550
description:
asm: enabled asm routines for HIGH_BIT_DEPTH, which has the support for 16bpp
Subject: [x265] size based array for intra_pred_ang[]
details: http://hg.videolan.org/x265/rev/cc7bb2f18d01
branches:
changeset: 5389:cc7bb2f18d01
user: Min Chen <chenm003 at 163.com>
date: Fri Nov 29 17:16:08 2013 +0800
description:
size based array for intra_pred_ang[]
Subject: [x265] asm : Adding asm routine for idst4
details: http://hg.videolan.org/x265/rev/3e8c280b16a6
branches:
changeset: 5390:3e8c280b16a6
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Fri Nov 29 20:27:24 2013 +0550
description:
asm : Adding asm routine for idst4
Subject: [x265] Enable idst4 asm
details: http://hg.videolan.org/x265/rev/d8c523bd9f90
branches:
changeset: 5391:d8c523bd9f90
user: Nabajit Deka <nabajit at multicorewareinc.com>
date: Fri Nov 29 20:30:15 2013 +0550
description:
Enable idst4 asm
Subject: [x265] 10bpp: asm code for pixel_var_32x32 and 64x64
details: http://hg.videolan.org/x265/rev/803048f62317
branches:
changeset: 5392:803048f62317
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Nov 29 21:33:25 2013 +0550
description:
10bpp: asm code for pixel_var_32x32 and 64x64
Subject: [x265] asm: plumb out more 16bpp asm setup infrastructure
details: http://hg.videolan.org/x265/rev/776fc3575e2d
branches:
changeset: 5393:776fc3575e2d
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 10:52:23 2013 -0600
description:
asm: plumb out more 16bpp asm setup infrastructure
Subject: [x265] intrapred: use square block defines, do not instantiate intra 64x64
details: http://hg.videolan.org/x265/rev/3409078021ac
branches:
changeset: 5394:3409078021ac
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 12:14:54 2013 -0600
description:
intrapred: use square block defines, do not instantiate intra 64x64
Subject: [x265] intrapred: fix func decl of intra-ang C ref
details: http://hg.videolan.org/x265/rev/50261fa292ad
branches:
changeset: 5395:50261fa292ad
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 12:17:05 2013 -0600
description:
intrapred: fix func decl of intra-ang C ref
Subject: [x265] vec: remove two DCT intrinsic primitives with asm coverage
details: http://hg.videolan.org/x265/rev/9facac4f81f7
branches:
changeset: 5396:9facac4f81f7
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 13:16:31 2013 -0600
description:
vec: remove two DCT intrinsic primitives with asm coverage
Subject: [x265] intra: testbench fixups after dropping 64x64 C refs
details: http://hg.videolan.org/x265/rev/81c09b55acf1
branches:
changeset: 5397:81c09b55acf1
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 14:14:15 2013 -0600
description:
intra: testbench fixups after dropping 64x64 C refs
Subject: [x265] vec: drop intra planar intrinsic primitives, we have asm coverage
details: http://hg.videolan.org/x265/rev/343d9ba487b2
branches:
changeset: 5398:343d9ba487b2
user: Steve Borho <steve at borho.org>
date: Sun Dec 01 14:16:46 2013 -0600
description:
vec: drop intra planar intrinsic primitives, we have asm coverage
diffstat:
source/Lib/TLibCommon/CommonDef.h | 3 -
source/Lib/TLibCommon/TComPrediction.cpp | 8 +-
source/Lib/TLibCommon/TComPrediction.h | 4 +-
source/common/CMakeLists.txt | 4 +-
source/common/common.cpp | 10 +-
source/common/intrapred.cpp | 21 +-
source/common/primitives.h | 5 +-
source/common/vec/dct-sse3.cpp | 91 ---
source/common/vec/dct-sse41.cpp | 114 ---
source/common/vec/dct-ssse3.cpp | 68 --
source/common/vec/intra-sse41.cpp | 206 ------
source/common/vec/intra-ssse3.cpp | 39 +-
source/common/vec/pixel-sse41.cpp | 371 ------------
source/common/x86/asm-primitives.cpp | 107 ++-
source/common/x86/blockcopy8.h | 1 +
source/common/x86/dct8.asm | 212 +++++++
source/common/x86/dct8.h | 2 +
source/common/x86/intrapred.asm | 566 -------------------
source/common/x86/intrapred.h | 1 +
source/common/x86/intrapred8.asm | 676 ++++++++++++++++++++++
source/common/x86/pixel-a.asm | 262 ++++++++-
source/common/x86/pixel-util.asm | 883 -----------------------------
source/common/x86/pixel-util8.asm | 922 +++++++++++++++++++++++++++++++
source/common/x86/pixel.h | 26 +-
source/encoder/compress.cpp | 31 +-
source/encoder/encoder.cpp | 2 +-
source/test/intrapredharness.cpp | 66 +-
source/test/intrapredharness.h | 2 +-
source/test/pixelharness.cpp | 37 +
source/test/pixelharness.h | 1 +
30 files changed, 2263 insertions(+), 2478 deletions(-)
diffs (truncated from 5423 to 300 lines):
diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/CommonDef.h
--- a/source/Lib/TLibCommon/CommonDef.h Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/CommonDef.h Sun Dec 01 14:16:46 2013 -0600
@@ -99,9 +99,6 @@
#define FAST_UDI_MAX_RDMODE_NUM 35 ///< maximum number of RD comparison in fast-UDI estimation loop
#define NUM_INTRA_MODE 36
-#if !REMOVE_LM_CHROMA
-#define LM_CHROMA_IDX 35
-#endif
#define PLANAR_IDX 0
#define VER_IDX 26 // index for intra VERTICAL mode
diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/TComPrediction.cpp
--- a/source/Lib/TLibCommon/TComPrediction.cpp Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp Sun Dec 01 14:16:46 2013 -0600
@@ -125,7 +125,7 @@ void TComPrediction::initTempBuff(int cs
// Public member functions
// ====================================================================================================================
-void TComPrediction::predIntraLumaAng(uint32_t dirMode, Pel* dst, uint32_t stride, int size)
+void TComPrediction::predIntraLumaAng(uint32_t dirMode, Pel* dst, intptr_t stride, int size)
{
assert(g_convertToBit[size] >= 0); // 4x 4
assert(g_convertToBit[size] <= 5); // 128x128
@@ -168,12 +168,12 @@ void TComPrediction::predIntraLumaAng(ui
}
else
{
- primitives.intra_pred_ang(dst, stride, size, dirMode, bFilter, refLft, refAbv);
+ primitives.intra_pred_ang[log2BlkSize - 2](dst, stride, refLft, refAbv, dirMode, bFilter);
}
}
// Angular chroma
-void TComPrediction::predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* dst, uint32_t stride, int width)
+void TComPrediction::predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* dst, intptr_t stride, int width)
{
int log2BlkSize = g_convertToBit[width];
@@ -199,7 +199,7 @@ void TComPrediction::predIntraChromaAng(
}
else
{
- primitives.intra_pred_ang(dst, stride, width, dirMode, false, refLft + width - 1, refAbv + width - 1);
+ primitives.intra_pred_ang[log2BlkSize](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
}
}
diff -r e7a5780843de -r 343d9ba487b2 source/Lib/TLibCommon/TComPrediction.h
--- a/source/Lib/TLibCommon/TComPrediction.h Thu Nov 28 23:30:16 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.h Sun Dec 01 14:16:46 2013 -0600
@@ -108,8 +108,8 @@ public:
void getMvPredAMVP(TComDataCU* cu, uint32_t partIdx, uint32_t partAddr, int picList, MV& mvPred);
// Angular Intra
- void predIntraLumaAng(uint32_t dirMode, Pel* pred, uint32_t stride, int width);
- void predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* pred, uint32_t stride, int width);
+ void predIntraLumaAng(uint32_t dirMode, Pel* pred, intptr_t stride, int width);
+ void predIntraChromaAng(Pel* src, uint32_t dirMode, Pel* pred, intptr_t stride, int width);
Pel* getPredicBuf() { return m_predBuf; }
diff -r e7a5780843de -r 343d9ba487b2 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/CMakeLists.txt Sun Dec 01 14:16:46 2013 -0600
@@ -113,8 +113,8 @@ endif(ENABLE_PRIMITIVES_VEC)
if(ENABLE_PRIMITIVES_ASM)
set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h)
- set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm ipfilter8.asm pixel-util.asm
- blockcopy8.asm intrapred.asm pixeladd8.asm dct8.asm)
+ set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm ipfilter8.asm pixel-util8.asm
+ blockcopy8.asm intrapred8.asm pixeladd8.asm dct8.asm)
if (NOT X64)
set(A_SRCS ${A_SRCS} pixel-32.asm)
endif()
diff -r e7a5780843de -r 343d9ba487b2 source/common/common.cpp
--- a/source/common/common.cpp Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/common.cpp Sun Dec 01 14:16:46 2013 -0600
@@ -164,7 +164,7 @@ void x265_param_default(x265_param *para
param->bframes = 4;
param->lookaheadDepth = 20;
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
- param->bpyramid = 2;
+ param->bpyramid = 1;
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
/* Intra Coding Tools */
@@ -278,7 +278,6 @@ int x265_param_default_preset(x265_param
param->maxCUSize = 32;
param->searchRange = 28;
param->bFrameAdaptive = 0;
- param->bpyramid = 1;
param->subpelRefine = 0;
param->maxNumMergeCand = 2;
param->searchMethod = X265_DIA_SEARCH;
@@ -298,7 +297,6 @@ int x265_param_default_preset(x265_param
param->maxCUSize = 32;
param->searchRange = 44;
param->bFrameAdaptive = 0;
- param->bpyramid = 1;
param->subpelRefine = 1;
param->bEnableRectInter = 0;
param->bEnableAMP = 0;
@@ -312,7 +310,6 @@ int x265_param_default_preset(x265_param
param->lookaheadDepth = 15;
param->maxCUSize = 32;
param->bFrameAdaptive = 0;
- param->bpyramid = 1;
param->subpelRefine = 1;
param->bEnableRectInter = 0;
param->bEnableAMP = 0;
@@ -324,7 +321,6 @@ int x265_param_default_preset(x265_param
{
param->lookaheadDepth = 15;
param->bFrameAdaptive = 0;
- param->bpyramid = 1;
param->bEnableRectInter = 0;
param->bEnableAMP = 0;
param->bEnableEarlySkip = 1;
@@ -334,7 +330,6 @@ int x265_param_default_preset(x265_param
else if (!strcmp(preset, "fast"))
{
param->lookaheadDepth = 15;
- param->bpyramid = 1;
param->bEnableRectInter = 0;
param->bEnableAMP = 0;
}
@@ -345,8 +340,7 @@ int x265_param_default_preset(x265_param
else if (!strcmp(preset, "slow"))
{
param->lookaheadDepth = 25;
- param->bframes = 8;
- param->bpyramid = 1;
+ param->bframes = 4;
param->rdLevel = 1;
param->subpelRefine = 3;
param->maxNumMergeCand = 3;
diff -r e7a5780843de -r 343d9ba487b2 source/common/intrapred.cpp
--- a/source/common/intrapred.cpp Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/intrapred.cpp Sun Dec 01 14:16:46 2013 -0600
@@ -146,7 +146,8 @@ void planad_pred_c(pixel* above, pixel*
}
}
-void ang_pred_c(pixel* dst, int dstStride, int width, int dirMode, bool bFilter, pixel *refLeft, pixel *refAbove)
+template<int width>
+void intra_pred_ang_c(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int dirMode, int bFilter)
{
// Map the mode index to main prediction direction and angle
int k, l;
@@ -265,7 +266,7 @@ void all_angs_pred_c(pixel *dest, pixel
pixel *above = (IntraFilterType[(int)g_convertToBit[size]][mode] ? above1 : above0);
pixel *out = dest + (mode - 2) * (size * size);
- ang_pred_c(out, size, size, mode, bLuma, left, above);
+ intra_pred_ang_c<size>(out, size, left, above, mode, bLuma);
// Optimize code don't flip buffer
bool modeHor = (mode < 18);
@@ -301,13 +302,15 @@ void Setup_C_IPredPrimitives(EncoderPrim
p.intra_pred_planar[BLOCK_8x8] = planad_pred_c<8>;
p.intra_pred_planar[BLOCK_16x16] = planad_pred_c<16>;
p.intra_pred_planar[BLOCK_32x32] = planad_pred_c<32>;
- p.intra_pred_planar[BLOCK_64x64] = planad_pred_c<64>;
- p.intra_pred_ang = ang_pred_c;
- p.intra_pred_allangs[0] = all_angs_pred_c<4>;
- p.intra_pred_allangs[1] = all_angs_pred_c<8>;
- p.intra_pred_allangs[2] = all_angs_pred_c<16>;
- p.intra_pred_allangs[3] = all_angs_pred_c<32>;
- p.intra_pred_allangs[4] = all_angs_pred_c<64>;
+ p.intra_pred_ang[BLOCK_4x4] = intra_pred_ang_c<4>;
+ p.intra_pred_ang[BLOCK_8x8] = intra_pred_ang_c<8>;
+ p.intra_pred_ang[BLOCK_16x16] = intra_pred_ang_c<16>;
+ p.intra_pred_ang[BLOCK_32x32] = intra_pred_ang_c<32>;
+
+ p.intra_pred_allangs[BLOCK_4x4] = all_angs_pred_c<4>;
+ p.intra_pred_allangs[BLOCK_8x8] = all_angs_pred_c<8>;
+ p.intra_pred_allangs[BLOCK_16x16] = all_angs_pred_c<16>;
+ p.intra_pred_allangs[BLOCK_32x32] = all_angs_pred_c<32>;
}
}
diff -r e7a5780843de -r 343d9ba487b2 source/common/primitives.h
--- a/source/common/primitives.h Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/primitives.h Sun Dec 01 14:16:46 2013 -0600
@@ -163,11 +163,10 @@ typedef void (*blockfill_s_t)(int16_t *d
typedef void (*intra_dc_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride, int bFilter);
typedef void (*intra_planar_t)(pixel* above, pixel* left, pixel* dst, intptr_t dstStride);
-typedef void (*intra_ang_t)(pixel* dst, int dstStride, int width, int dirMode, bool bFilter, pixel *refLeft, pixel *refAbove);
+typedef void (*intra_ang_t)(pixel* dst, intptr_t dstStride, pixel *refLeft, pixel *refAbove, int width, int bFilter);
typedef void (*intra_allangs_t)(pixel *dst, pixel *above0, pixel *left0, pixel *above1, pixel *left1, bool bLuma);
typedef void (*cvt16to32_shl_t)(int32_t *dst, int16_t *src, intptr_t, int, int);
-typedef void (*cvt16to16_shl_t)(int16_t *dst, int16_t *src, int, int, intptr_t, int);
typedef void (*cvt32to16_shr_t)(int16_t *dst, int32_t *src, intptr_t, int, int);
typedef void (*dct_t)(int16_t *src, int32_t *dst, intptr_t stride);
@@ -251,7 +250,7 @@ struct EncoderPrimitives
intra_dc_t intra_pred_dc[NUM_SQUARE_BLOCKS];
intra_planar_t intra_pred_planar[NUM_SQUARE_BLOCKS];
- intra_ang_t intra_pred_ang;
+ intra_ang_t intra_pred_ang[NUM_SQUARE_BLOCKS];
intra_allangs_t intra_pred_allangs[NUM_SQUARE_BLOCKS];
scale_t scale1D_128to64;
scale_t scale2D_64to32;
diff -r e7a5780843de -r 343d9ba487b2 source/common/vec/dct-sse3.cpp
--- a/source/common/vec/dct-sse3.cpp Thu Nov 28 23:30:16 2013 -0600
+++ b/source/common/vec/dct-sse3.cpp Sun Dec 01 14:16:46 2013 -0600
@@ -41,97 +41,6 @@ using namespace x265;
namespace {
#if !HIGH_BIT_DEPTH
-ALIGN_VAR_32(static const int16_t, tab_idct_4x4[4][8]) =
-{
- { 64, 64, 64, 64, 64, 64, 64, 64 },
- { 64, -64, 64, -64, 64, -64, 64, -64 },
- { 83, 36, 83, 36, 83, 36, 83, 36 },
- { 36, -83, 36, -83, 36, -83, 36, -83 },
-};
-void idct4(int32_t *src, int16_t *dst, intptr_t stride)
-{
- __m128i S0, S8, m128iAdd, m128Tmp1, m128Tmp2, E1, E2, O1, O2, m128iA, m128iD;
-
- m128Tmp1 = _mm_load_si128((__m128i*)&src[0]);
- m128Tmp2 = _mm_load_si128((__m128i*)&src[4]);
- S0 = _mm_packs_epi32(m128Tmp1, m128Tmp2);
-
- m128Tmp1 = _mm_load_si128((__m128i*)&src[8]);
- m128Tmp2 = _mm_load_si128((__m128i*)&src[12]);
- S8 = _mm_packs_epi32(m128Tmp1, m128Tmp2);
-
- m128iAdd = _mm_set1_epi32(64);
-
- m128Tmp1 = _mm_unpacklo_epi16(S0, S8);
- E1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[0])));
- E1 = _mm_add_epi32(E1, m128iAdd);
-
- E2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[1])));
- E2 = _mm_add_epi32(E2, m128iAdd);
-
- m128Tmp1 = _mm_unpackhi_epi16(S0, S8);
- O1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[2])));
- O2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[3])));
-
- m128iA = _mm_add_epi32(E1, O1);
- m128iA = _mm_srai_epi32(m128iA, 7); // sum = sum >> shiftNum
- m128Tmp1 = _mm_add_epi32(E2, O2);
- m128Tmp1 = _mm_srai_epi32(m128Tmp1, 7); // sum = sum >> shiftNum
- m128iA = _mm_packs_epi32(m128iA, m128Tmp1);
-
- m128iD = _mm_sub_epi32(E2, O2);
- m128iD = _mm_srai_epi32(m128iD, 7); // sum = sum >> shiftNum
-
- m128Tmp1 = _mm_sub_epi32(E1, O1);
- m128Tmp1 = _mm_srai_epi32(m128Tmp1, 7); // sum = sum >> shiftNum
-
- m128iD = _mm_packs_epi32(m128iD, m128Tmp1);
-
- S0 = _mm_unpacklo_epi16(m128iA, m128iD);
- S8 = _mm_unpackhi_epi16(m128iA, m128iD);
-
- m128iA = _mm_unpacklo_epi16(S0, S8);
- m128iD = _mm_unpackhi_epi16(S0, S8);
-
- /* ########################## */
-
- m128iAdd = _mm_set1_epi32(2048);
- m128Tmp1 = _mm_unpacklo_epi16(m128iA, m128iD);
- E1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[0])));
- E1 = _mm_add_epi32(E1, m128iAdd);
-
- E2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[1])));
- E2 = _mm_add_epi32(E2, m128iAdd);
-
- m128Tmp1 = _mm_unpackhi_epi16(m128iA, m128iD);
- O1 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[2])));
- O2 = _mm_madd_epi16(m128Tmp1, _mm_load_si128((__m128i*)(tab_idct_4x4[3])));
-
- m128iA = _mm_add_epi32(E1, O1);
- m128iA = _mm_srai_epi32(m128iA, 12);
- m128Tmp1 = _mm_add_epi32(E2, O2);
- m128Tmp1 = _mm_srai_epi32(m128Tmp1, 12);
- m128iA = _mm_packs_epi32(m128iA, m128Tmp1);
-
- m128iD = _mm_sub_epi32(E2, O2);
- m128iD = _mm_srai_epi32(m128iD, 12);
-
- m128Tmp1 = _mm_sub_epi32(E1, O1);
- m128Tmp1 = _mm_srai_epi32(m128Tmp1, 12);
-
- m128iD = _mm_packs_epi32(m128iD, m128Tmp1);
-
- m128Tmp1 = _mm_unpacklo_epi16(m128iA, m128iD); // [32 30 22 20 12 10 02 00]
- m128Tmp2 = _mm_unpackhi_epi16(m128iA, m128iD); // [33 31 23 21 13 11 03 01]
- m128iA = _mm_unpacklo_epi16(m128Tmp1, m128Tmp2);
- m128iD = _mm_unpackhi_epi16(m128Tmp1, m128Tmp2);
More information about the x265-commits
mailing list