[x265-commits] [x265] multilib: remove WINXP=ON from multilib scripts
Deepthi Nandakumar
deepthi at multicorewareinc.com
Mon Jun 29 18:51:56 CEST 2015
details: http://hg.videolan.org/x265/rev/2b807e39d07a
branches:
changeset: 10710:2b807e39d07a
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Thu Jun 25 15:06:09 2015 +0530
description:
multilib: remove WINXP=ON from multilib scripts
Subject: [x265] asm: pixelavg_pp[32xN],[64xN],48x64 avx2 code for 10bpp
details: http://hg.videolan.org/x265/rev/0b630c4d2380
branches:
changeset: 10711:0b630c4d2380
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Wed Jun 24 18:03:54 2015 +0530
description:
asm: pixelavg_pp[32xN],[64xN],48x64 avx2 code for 10bpp
avx2:
avg_pp[ 32x8] 13.95x 345.28 4815.70
avg_pp[32x16] 18.23x 535.22 9759.08
avg_pp[32x24] 19.25x 753.64 14506.10
avg_pp[32x32] 19.68x 975.15 19192.85
avg_pp[32x64] 21.43x 1841.33 39462.92
avg_pp[64x16] 19.15x 987.13 18901.01
avg_pp[64x32] 20.18x 1874.47 37825.34
avg_pp[64x48] 19.89x 2837.11 56439.58
avg_pp[64x64] 19.76x 3774.05 74572.41
avg_pp[48x64] 19.65x 2752.09 54082.53
sse2:
avg_pp[ 32x8] 10.37x 470.87 4883.57
avg_pp[32x16] 11.15x 873.08 9737.43
avg_pp[32x24] 11.34x 1287.71 14596.59
avg_pp[32x32] 11.41x 1697.46 19369.11
avg_pp[32x64] 12.52x 3220.95 40330.95
avg_pp[64x16] 10.94x 1670.19 18267.47
avg_pp[64x32] 11.49x 3274.41 37635.54
avg_pp[64x48] 11.79x 4802.15 56622.23
avg_pp[64x64] 11.30x 6667.17 75332.41
avg_pp[48x64] 10.56x 5138.91 54275.12
Subject: [x265] asm: pixelavg_pp[12x16],[24x32] avx2 code for 10bpp
details: http://hg.videolan.org/x265/rev/d524bf89ca52
branches:
changeset: 10712:d524bf89ca52
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Wed Jun 24 18:18:06 2015 +0530
description:
asm: pixelavg_pp[12x16],[24x32] avx2 code for 10bpp
avx2:
avg_pp[24x32] 14.35x 965.89 13860.97
avg_pp[12x16] 7.78x 487.43 3791.49
sse2:
avg_pp[24x32] 5.49x 2566.36 14091.85
avg_pp[12x16] 4.95x 744.74 3683.95
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c->690c over SSE
details: http://hg.videolan.org/x265/rev/510b5746307e
branches:
changeset: 10713:510b5746307e
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 11:43:14 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c->690c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE1, improved 492c->360c over SSE
details: http://hg.videolan.org/x265/rev/c08bd6116ac8
branches:
changeset: 10714:c08bd6116ac8
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 11:49:07 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE1, improved 492c->360c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c->614c over SSE
details: http://hg.videolan.org/x265/rev/34687abe0c98
branches:
changeset: 10715:34687abe0c98
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 11:54:22 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c->614c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE2
details: http://hg.videolan.org/x265/rev/a49ed3dd9e93
branches:
changeset: 10716:a49ed3dd9e93
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 12:00:57 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE2
SAO_EO_2[0] 207c->166
SAO_EO_2[1] 555c->422c
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE3
details: http://hg.videolan.org/x265/rev/259358584e82
branches:
changeset: 10717:259358584e82
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 12:11:45 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE3
SAO_EO_3[0] 236c->195
SAO_EO_3[1] 570c->490c
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c->15595c over SSE
details: http://hg.videolan.org/x265/rev/1e5c4d155ab8
branches:
changeset: 10718:1e5c4d155ab8
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Thu Jun 25 13:42:29 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c->15595c over SSE
Subject: [x265] cmake: allow the CLI option to be cached
details: http://hg.videolan.org/x265/rev/9b26dc9ec39d
branches:
changeset: 10719:9b26dc9ec39d
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Jun 26 12:58:27 2015 +0530
description:
cmake: allow the CLI option to be cached
This will expand the scope of this variable
Subject: [x265] asm: fix gcc build error, invalid size for operand 1
details: http://hg.videolan.org/x265/rev/cf49500d0247
branches:
changeset: 10720:cf49500d0247
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Jun 26 13:59:50 2015 +0530
description:
asm: fix gcc build error, invalid size for operand 1
Subject: [x265] asm: avx2 10bit code for sign primitive(356.91 -> 242.00)
details: http://hg.videolan.org/x265/rev/a4ff10112309
branches:
changeset: 10721:a4ff10112309
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Jun 25 16:39:58 2015 +0530
description:
asm: avx2 10bit code for sign primitive(356.91 -> 242.00)
avx2:
calSign 9.08x 242.00 2197.71
sse4:
calSign 6.16x 356.91 2197.63
Subject: [x265] asm: sse4 10bit code for sign primitive
details: http://hg.videolan.org/x265/rev/d64227e54233
branches:
changeset: 10722:d64227e54233
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Thu Jun 25 16:25:51 2015 +0530
description:
asm: sse4 10bit code for sign primitive
calSign 6.16x 356.91 2197.63
Subject: [x265] asm: intra_filter4x4 sse4 code and added testbench support, improved 357c->141c over C code
details: http://hg.videolan.org/x265/rev/7006c4490e4d
branches:
changeset: 10723:7006c4490e4d
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Jun 26 18:21:07 2015 +0530
description:
asm: intra_filter4x4 sse4 code and added testbench support, improved 357c->141c over C code
Subject: [x265] asm: intra_filter8x8 sse4 code, improved 990c->201c over C code
details: http://hg.videolan.org/x265/rev/f4f0b5509954
branches:
changeset: 10724:f4f0b5509954
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Jun 26 18:28:40 2015 +0530
description:
asm: intra_filter8x8 sse4 code, improved 990c->201c over C code
Subject: [x265] asm: intra_filter16x16 sse4 code, improved 1952c->351c over C code
details: http://hg.videolan.org/x265/rev/c29ced3490f4
branches:
changeset: 10725:c29ced3490f4
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Jun 26 18:32:00 2015 +0530
description:
asm: intra_filter16x16 sse4 code, improved 1952c->351c over C code
Subject: [x265] asm: intra_filter32x32 sse4 code, improved 4050c->652c over C code
details: http://hg.videolan.org/x265/rev/a12d44fb0319
branches:
changeset: 10726:a12d44fb0319
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Jun 26 18:35:58 2015 +0530
description:
asm: intra_filter32x32 sse4 code, improved 4050c->652c over C code
Subject: [x265] asm: cleanup unused constant and update copyright header
details: http://hg.videolan.org/x265/rev/c744d42ea678
branches:
changeset: 10727:c744d42ea678
user: Min Chen <chenm003 at 163.com>
date: Fri Jun 26 18:54:13 2015 -0700
description:
asm: cleanup unused constant and update copyright header
Subject: [x265] motion: remove mvc's sad cost calc for lowres, already measured in slicetype
details: http://hg.videolan.org/x265/rev/f7bbb04e1992
branches:
changeset: 10728:f7bbb04e1992
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Fri Jun 26 10:16:29 2015 +0530
description:
motion: remove mvc's sad cost calc for lowres, already measured in slicetype
Subject: [x265] rc: fixes inconsistent output in linux because of RC Lock in CQP/CRF
details: http://hg.videolan.org/x265/rev/9feee64efa44
branches:
changeset: 10729:9feee64efa44
user: Aarthi Thirumalai
date: Fri Jun 26 15:29:51 2015 +0530
description:
rc: fixes inconsistent output in linux because of RC Lock in CQP/CRF
the inconsistency is due to a race hazrd in slice context initializations when 2 Frame Encoders complete
RateControlStart in correct order but slice context initializations in the wrong order in CRF/CQP.
Subject: [x265] common: fix for multilib checked builds, move g_checkFailures within namespace
details: http://hg.videolan.org/x265/rev/c0fc87075c75
branches:
changeset: 10730:c0fc87075c75
user: Steve Borho <steve at borho.org>
date: Mon Jun 29 11:47:39 2015 -0500
description:
common: fix for multilib checked builds, move g_checkFailures within namespace
diffstat:
build/vc10-x86_64/multilib.bat | 4 +-
build/vc11-x86_64/multilib.bat | 4 +-
build/vc12-x86_64/multilib.bat | 4 +-
build/vc9-x86_64/multilib.bat | 4 +-
source/CMakeLists.txt | 2 +-
source/common/common.cpp | 4 +-
source/common/common.h | 2 +-
source/common/x86/asm-primitives.cpp | 28 +
source/common/x86/const-a.asm | 4 +-
source/common/x86/intrapred.h | 1 +
source/common/x86/intrapred8.asm | 414 +++++++++++++++++++++++++
source/common/x86/loopfilter.asm | 565 +++++++++++++++++++++++++++++++++++
source/common/x86/mc-a.asm | 449 +++++++++++++++++++++++++++-
source/common/x86/sad-a.asm | 7 +-
source/encoder/frameencoder.cpp | 13 +
source/encoder/motion.cpp | 34 +-
source/encoder/ratecontrol.cpp | 12 -
source/test/intrapredharness.cpp | 44 ++
source/test/intrapredharness.h | 9 +
19 files changed, 1545 insertions(+), 59 deletions(-)
diffs (truncated from 2062 to 300 lines):
diff -r b1af4c36f48a -r c0fc87075c75 build/vc10-x86_64/multilib.bat
--- a/build/vc10-x86_64/multilib.bat Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc10-x86_64/multilib.bat Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS100COMNTOOLS%" == "" (
@cd 10bit
if not exist x265.sln (
- cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+ cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
)
if exist x265.sln (
call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
exit 1
)
if not exist x265.sln (
- cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+ cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
)
if exist x265.sln (
call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc11-x86_64/multilib.bat
--- a/build/vc11-x86_64/multilib.bat Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc11-x86_64/multilib.bat Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS110COMNTOOLS%" == "" (
@cd 10bit
if not exist x265.sln (
- cmake -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+ cmake -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
)
if exist x265.sln (
call "%VS110COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
exit 1
)
if not exist x265.sln (
- cmake -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+ cmake -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
)
if exist x265.sln (
call "%VS110COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc12-x86_64/multilib.bat
--- a/build/vc12-x86_64/multilib.bat Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc12-x86_64/multilib.bat Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS120COMNTOOLS%" == "" (
@cd 10bit
if not exist x265.sln (
- cmake -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+ cmake -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
)
if exist x265.sln (
call "%VS120COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
exit 1
)
if not exist x265.sln (
- cmake -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+ cmake -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
)
if exist x265.sln (
call "%VS120COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc9-x86_64/multilib.bat
--- a/build/vc9-x86_64/multilib.bat Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc9-x86_64/multilib.bat Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS90COMNTOOLS%" == "" (
@cd 10bit
if not exist x265.sln (
- cmake -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+ cmake -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
)
if exist x265.sln (
call "%VS90COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
exit 1
)
if not exist x265.sln (
- cmake -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+ cmake -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
)
if exist x265.sln (
call "%VS90COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 source/CMakeLists.txt
--- a/source/CMakeLists.txt Wed Jun 24 10:36:15 2015 -0500
+++ b/source/CMakeLists.txt Mon Jun 29 11:47:39 2015 -0500
@@ -490,7 +490,7 @@ if(NOT WIN32)
endif()
# Main CLI application
-option(ENABLE_CLI "Build standalone CLI application" ON)
+set(ENABLE_CLI ON CACHE BOOL "Build standalone CLI application")
if(ENABLE_CLI)
file(GLOB InputFiles input/input.cpp input/yuv.cpp input/y4m.cpp input/*.h)
file(GLOB OutputFiles output/output.cpp output/reconplay.cpp output/*.h
diff -r b1af4c36f48a -r c0fc87075c75 source/common/common.cpp
--- a/source/common/common.cpp Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/common.cpp Mon Jun 29 11:47:39 2015 -0500
@@ -33,12 +33,12 @@
#include <sys/time.h>
#endif
+namespace X265_NS {
+
#if CHECKED_BUILD || _DEBUG
int g_checkFailures;
#endif
-namespace X265_NS {
-
int64_t x265_mdate(void)
{
#if _WIN32
diff -r b1af4c36f48a -r c0fc87075c75 source/common/common.h
--- a/source/common/common.h Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/common.h Mon Jun 29 11:47:39 2015 -0500
@@ -106,7 +106,7 @@
/* If compiled with CHECKED_BUILD perform run-time checks and log any that
* fail, both to stderr and to a file */
#if CHECKED_BUILD || _DEBUG
-extern int g_checkFailures;
+namespace X265_NS { extern int g_checkFailures; }
#define X265_CHECK(expr, ...) if (!(expr)) { \
x265_log(NULL, X265_LOG_ERROR, __VA_ARGS__); \
FILE *fp = fopen("x265_check_failures.txt", "a"); \
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Mon Jun 29 11:47:39 2015 -0500
@@ -1097,6 +1097,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.saoCuOrgE3[0] = PFX(saoCuOrgE3_sse4);
p.saoCuOrgE3[1] = PFX(saoCuOrgE3_sse4);
p.saoCuOrgB0 = PFX(saoCuOrgB0_sse4);
+ p.sign = PFX(calSign_sse4);
LUMA_ADDAVG(sse4);
CHROMA_420_ADDAVG(sse4);
@@ -1284,6 +1285,15 @@ void setupAssemblyPrimitives(EncoderPrim
}
if (cpuMask & X265_CPU_AVX2)
{
+ p.saoCuOrgE0 = PFX(saoCuOrgE0_avx2);
+ p.saoCuOrgE1 = PFX(saoCuOrgE1_avx2);
+ p.saoCuOrgE1_2Rows = PFX(saoCuOrgE1_2Rows_avx2);
+ p.saoCuOrgE2[0] = PFX(saoCuOrgE2_avx2);
+ p.saoCuOrgE2[1] = PFX(saoCuOrgE2_32_avx2);
+ p.saoCuOrgE3[0] = PFX(saoCuOrgE3_avx2);
+ p.saoCuOrgE3[1] = PFX(saoCuOrgE3_32_avx2);
+ p.saoCuOrgB0 = PFX(saoCuOrgB0_avx2);
+
p.cu[BLOCK_16x16].intra_pred[2] = PFX(intra_pred_ang16_2_avx2);
p.cu[BLOCK_16x16].intra_pred[3] = PFX(intra_pred_ang16_3_avx2);
p.cu[BLOCK_16x16].intra_pred[4] = PFX(intra_pred_ang16_4_avx2);
@@ -1352,12 +1362,24 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_32x32].intra_pred[33] = PFX(intra_pred_ang32_33_avx2);
p.cu[BLOCK_32x32].intra_pred[34] = PFX(intra_pred_ang32_2_avx2);
+ p.pu[LUMA_12x16].pixelavg_pp = PFX(pixel_avg_12x16_avx2);
p.pu[LUMA_16x4].pixelavg_pp = PFX(pixel_avg_16x4_avx2);
p.pu[LUMA_16x8].pixelavg_pp = PFX(pixel_avg_16x8_avx2);
p.pu[LUMA_16x12].pixelavg_pp = PFX(pixel_avg_16x12_avx2);
p.pu[LUMA_16x16].pixelavg_pp = PFX(pixel_avg_16x16_avx2);
p.pu[LUMA_16x32].pixelavg_pp = PFX(pixel_avg_16x32_avx2);
p.pu[LUMA_16x64].pixelavg_pp = PFX(pixel_avg_16x64_avx2);
+ p.pu[LUMA_24x32].pixelavg_pp = PFX(pixel_avg_24x32_avx2);
+ p.pu[LUMA_32x8].pixelavg_pp = PFX(pixel_avg_32x8_avx2);
+ p.pu[LUMA_32x16].pixelavg_pp = PFX(pixel_avg_32x16_avx2);
+ p.pu[LUMA_32x24].pixelavg_pp = PFX(pixel_avg_32x24_avx2);
+ p.pu[LUMA_32x32].pixelavg_pp = PFX(pixel_avg_32x32_avx2);
+ p.pu[LUMA_32x64].pixelavg_pp = PFX(pixel_avg_32x64_avx2);
+ p.pu[LUMA_64x16].pixelavg_pp = PFX(pixel_avg_64x16_avx2);
+ p.pu[LUMA_64x32].pixelavg_pp = PFX(pixel_avg_64x32_avx2);
+ p.pu[LUMA_64x48].pixelavg_pp = PFX(pixel_avg_64x48_avx2);
+ p.pu[LUMA_64x64].pixelavg_pp = PFX(pixel_avg_64x64_avx2);
+ p.pu[LUMA_48x64].pixelavg_pp = PFX(pixel_avg_48x64_avx2);
p.pu[LUMA_8x4].addAvg = PFX(addAvg_8x4_avx2);
p.pu[LUMA_8x8].addAvg = PFX(addAvg_8x8_avx2);
@@ -1495,6 +1517,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.scale1D_128to64 = PFX(scale1D_128to64_avx2);
p.scale2D_64to32 = PFX(scale2D_64to32_avx2);
p.weight_pp = PFX(weight_pp_avx2);
+ p.sign = PFX(calSign_avx2);
p.cu[BLOCK_16x16].calcresidual = PFX(getResidual16_avx2);
p.cu[BLOCK_32x32].calcresidual = PFX(getResidual32_avx2);
@@ -2430,6 +2453,11 @@ void setupAssemblyPrimitives(EncoderPrim
p.weight_pp = PFX(weight_pp_sse4);
p.weight_sp = PFX(weight_sp_sse4);
+ p.cu[BLOCK_4x4].intra_filter = PFX(intra_filter_4x4_sse4);
+ p.cu[BLOCK_8x8].intra_filter = PFX(intra_filter_8x8_sse4);
+ p.cu[BLOCK_16x16].intra_filter = PFX(intra_filter_16x16_sse4);
+ p.cu[BLOCK_32x32].intra_filter = PFX(intra_filter_32x32_sse4);
+
ALL_LUMA_TU_S(intra_pred[PLANAR_IDX], intra_pred_planar, sse4);
ALL_LUMA_TU_S(intra_pred[DC_IDX], intra_pred_dc, sse4);
ALL_LUMA_TU(intra_pred_allangs, all_angs_pred, sse4);
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/const-a.asm Mon Jun 29 11:47:39 2015 -0500
@@ -41,7 +41,7 @@ const pb_15, times 32 db
const pb_16, times 32 db 16
const pb_32, times 32 db 32
const pb_64, times 32 db 64
-const pb_128, times 16 db 128
+const pb_128, times 32 db 128
const pb_a1, times 16 db 0xa1
const pb_01, times 8 db 0, 1
@@ -136,5 +136,3 @@ const popcnt_table
db ((x>>0)&1)+((x>>1)&1)+((x>>2)&1)+((x>>3)&1)+((x>>4)&1)+((x>>5)&1)+((x>>6)&1)+((x>>7)&1)
%assign x x+1
%endrep
-
-const sw_64, dd 64
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/intrapred.h Mon Jun 29 11:47:39 2015 -0500
@@ -66,6 +66,7 @@
#define DECL_ALL(cpu) \
FUNCDEF_TU(void, all_angs_pred, cpu, pixel *dest, pixel *refPix, pixel *filtPix, int bLuma); \
+ FUNCDEF_TU(void, intra_filter, cpu, const pixel *samples, pixel *filtered); \
DECL_ANGS(4, cpu); \
DECL_ANGS(8, cpu); \
DECL_ANGS(16, cpu); \
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/intrapred8.asm Mon Jun 29 11:47:39 2015 -0500
@@ -30,6 +30,9 @@ SECTION_RODATA 32
intra_pred_shuff_0_8: times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
intra_pred_shuff_15_0: times 2 db 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+intra_filter4_shuf0: db 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13
+intra_filter4_shuf1: db 14,15,0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13
+
pb_0_8 times 8 db 0, 8
pb_unpackbw1 times 2 db 1, 8, 2, 8, 3, 8, 4, 8
pb_swap8: times 2 db 7, 6, 5, 4, 3, 2, 1, 0
@@ -18276,3 +18279,414 @@ cglobal intra_pred_ang4_25, 3, 3, 1
INTRA_PRED_STORE_4x4
RET
+
+;-----------------------------------------------------------------------------------
+; void intra_filter_NxN(const pixel* references, pixel* filtered)
+;-----------------------------------------------------------------------------------
+INIT_XMM sse4
+cglobal intra_filter_4x4, 2,4,5
+ mov r2b, byte [r0 + 8] ; topLast
+ mov r3b, byte [r0 + 16] ; LeftLast
+
+ ; filtering top
+ pmovzxbw m0, [r0 + 0]
+ pmovzxbw m1, [r0 + 8]
+ pmovzxbw m2, [r0 + 16]
+
+ pshufb m4, m0, [intra_filter4_shuf0] ; [6 5 4 3 2 1 0 1] samples[i - 1]
+ palignr m3, m1, m0, 4
+ pshufb m3, [intra_filter4_shuf1] ; [8 7 6 5 4 3 2 9] samples[i + 1]
+
+ psllw m0, 1
+ paddw m4, m3
+ paddw m0, m4
+ paddw m0, [pw_2]
+ psrlw m0, 2
+
+ ; filtering left
+ palignr m4, m1, m1, 14 ; [14 13 12 11 10 9 8 15] samples[i - 1]
+ pinsrb m4, [r0], 2 ; [14 13 12 11 10 9 0 15] samples[i + 1]
+ palignr m3, m2, m1, 4
+ pshufb m3, [intra_filter4_shuf1]
+
+ psllw m1, 1
+ paddw m4, m3
+ paddw m1, m4
+ paddw m1, [pw_2]
+ psrlw m1, 2
+ packuswb m0, m1
+
+ movu [r1], m0
+ mov [r1 + 8], r2b ; topLast
+ mov [r1 + 16], r3b ; LeftLast
+ RET
+
+INIT_XMM sse4
+cglobal intra_filter_8x8, 2,4,6
+ mov r2b, byte [r0 + 16] ; topLast
+ mov r3b, byte [r0 + 32] ; LeftLast
+
+ ; filtering top
+ pmovzxbw m0, [r0 + 0]
+ pmovzxbw m1, [r0 + 8]
+ pmovzxbw m2, [r0 + 16]
+
+ pshufb m4, m0, [intra_filter4_shuf0] ; [6 5 4 3 2 1 0 1] samples[i - 1]
+ palignr m5, m1, m0, 2
+ pinsrb m5, [r0 + 17], 0 ; [8 7 6 5 4 3 2 9] samples[i + 1]
+
More information about the x265-commits
mailing list