[x265-commits] [x265] multilib: remove WINXP=ON from multilib scripts

Deepthi Nandakumar deepthi at multicorewareinc.com
Mon Jun 29 18:51:56 CEST 2015


details:   http://hg.videolan.org/x265/rev/2b807e39d07a
branches:  
changeset: 10710:2b807e39d07a
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jun 25 15:06:09 2015 +0530
description:
multilib: remove WINXP=ON from multilib scripts
Subject: [x265] asm: pixelavg_pp[32xN],[64xN],48x64 avx2 code for 10bpp

details:   http://hg.videolan.org/x265/rev/0b630c4d2380
branches:  
changeset: 10711:0b630c4d2380
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Wed Jun 24 18:03:54 2015 +0530
description:
asm: pixelavg_pp[32xN],[64xN],48x64 avx2 code for 10bpp

avx2:
avg_pp[ 32x8]  13.95x   345.28          4815.70
avg_pp[32x16]  18.23x   535.22          9759.08
avg_pp[32x24]  19.25x   753.64          14506.10
avg_pp[32x32]  19.68x   975.15          19192.85
avg_pp[32x64]  21.43x   1841.33         39462.92
avg_pp[64x16]  19.15x   987.13          18901.01
avg_pp[64x32]  20.18x   1874.47         37825.34
avg_pp[64x48]  19.89x   2837.11         56439.58
avg_pp[64x64]  19.76x   3774.05         74572.41
avg_pp[48x64]  19.65x   2752.09         54082.53

sse2:
avg_pp[ 32x8]  10.37x   470.87          4883.57
avg_pp[32x16]  11.15x   873.08          9737.43
avg_pp[32x24]  11.34x   1287.71         14596.59
avg_pp[32x32]  11.41x   1697.46         19369.11
avg_pp[32x64]  12.52x   3220.95         40330.95
avg_pp[64x16]  10.94x   1670.19         18267.47
avg_pp[64x32]  11.49x   3274.41         37635.54
avg_pp[64x48]  11.79x   4802.15         56622.23
avg_pp[64x64]  11.30x   6667.17         75332.41
avg_pp[48x64]  10.56x   5138.91         54275.12
Subject: [x265] asm: pixelavg_pp[12x16],[24x32] avx2 code for 10bpp

details:   http://hg.videolan.org/x265/rev/d524bf89ca52
branches:  
changeset: 10712:d524bf89ca52
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Wed Jun 24 18:18:06 2015 +0530
description:
asm: pixelavg_pp[12x16],[24x32] avx2 code for 10bpp

avx2:
avg_pp[24x32]  14.35x   965.89          13860.97
avg_pp[12x16]  7.78x    487.43          3791.49

sse2:
avg_pp[24x32]  5.49x    2566.36         14091.85
avg_pp[12x16]  4.95x    744.74          3683.95
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c->690c over SSE

details:   http://hg.videolan.org/x265/rev/510b5746307e
branches:  
changeset: 10713:510b5746307e
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 11:43:14 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c->690c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE1, improved 492c->360c over SSE

details:   http://hg.videolan.org/x265/rev/c08bd6116ac8
branches:  
changeset: 10714:c08bd6116ac8
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 11:49:07 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE1, improved 492c->360c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c->614c over SSE

details:   http://hg.videolan.org/x265/rev/34687abe0c98
branches:  
changeset: 10715:34687abe0c98
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 11:54:22 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c->614c over SSE
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE2

details:   http://hg.videolan.org/x265/rev/a49ed3dd9e93
branches:  
changeset: 10716:a49ed3dd9e93
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 12:00:57 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE2

SAO_EO_2[0] 207c->166
SAO_EO_2[1] 555c->422c
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgE3

details:   http://hg.videolan.org/x265/rev/259358584e82
branches:  
changeset: 10717:259358584e82
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 12:11:45 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgE3

SAO_EO_3[0] 236c->195
SAO_EO_3[1] 570c->490c
Subject: [x265] asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c->15595c over SSE

details:   http://hg.videolan.org/x265/rev/1e5c4d155ab8
branches:  
changeset: 10718:1e5c4d155ab8
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Thu Jun 25 13:42:29 2015 +0530
description:
asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c->15595c over SSE
Subject: [x265] cmake: allow the CLI option to be cached

details:   http://hg.videolan.org/x265/rev/9b26dc9ec39d
branches:  
changeset: 10719:9b26dc9ec39d
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Jun 26 12:58:27 2015 +0530
description:
cmake: allow the CLI option to be cached

This will expand the scope of this variable
Subject: [x265] asm: fix gcc build error, invalid size for operand 1

details:   http://hg.videolan.org/x265/rev/cf49500d0247
branches:  
changeset: 10720:cf49500d0247
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Fri Jun 26 13:59:50 2015 +0530
description:
asm: fix gcc build error, invalid size for operand 1
Subject: [x265] asm: avx2 10bit code for sign primitive(356.91 -> 242.00)

details:   http://hg.videolan.org/x265/rev/a4ff10112309
branches:  
changeset: 10721:a4ff10112309
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Jun 25 16:39:58 2015 +0530
description:
asm: avx2 10bit code for sign primitive(356.91 -> 242.00)

avx2:
calSign  9.08x    242.00          2197.71

sse4:
calSign  6.16x    356.91          2197.63
Subject: [x265] asm: sse4 10bit code for sign primitive

details:   http://hg.videolan.org/x265/rev/d64227e54233
branches:  
changeset: 10722:d64227e54233
user:      Rajesh Paulraj<rajesh at multicorewareinc.com>
date:      Thu Jun 25 16:25:51 2015 +0530
description:
asm: sse4 10bit code for sign primitive

     calSign  6.16x    356.91          2197.63
Subject: [x265] asm: intra_filter4x4 sse4 code and added testbench support, improved 357c->141c over C code

details:   http://hg.videolan.org/x265/rev/7006c4490e4d
branches:  
changeset: 10723:7006c4490e4d
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Fri Jun 26 18:21:07 2015 +0530
description:
asm: intra_filter4x4 sse4 code and added testbench support, improved 357c->141c over C code
Subject: [x265] asm: intra_filter8x8 sse4 code, improved 990c->201c over C code

details:   http://hg.videolan.org/x265/rev/f4f0b5509954
branches:  
changeset: 10724:f4f0b5509954
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Fri Jun 26 18:28:40 2015 +0530
description:
asm: intra_filter8x8 sse4 code, improved 990c->201c over C code
Subject: [x265] asm: intra_filter16x16 sse4 code, improved 1952c->351c over C code

details:   http://hg.videolan.org/x265/rev/c29ced3490f4
branches:  
changeset: 10725:c29ced3490f4
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Fri Jun 26 18:32:00 2015 +0530
description:
asm: intra_filter16x16 sse4 code, improved 1952c->351c over C code
Subject: [x265] asm: intra_filter32x32 sse4 code, improved 4050c->652c over C code

details:   http://hg.videolan.org/x265/rev/a12d44fb0319
branches:  
changeset: 10726:a12d44fb0319
user:      Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date:      Fri Jun 26 18:35:58 2015 +0530
description:
asm: intra_filter32x32 sse4 code, improved 4050c->652c over C code
Subject: [x265] asm: cleanup unused constant and update copyright header

details:   http://hg.videolan.org/x265/rev/c744d42ea678
branches:  
changeset: 10727:c744d42ea678
user:      Min Chen <chenm003 at 163.com>
date:      Fri Jun 26 18:54:13 2015 -0700
description:
asm: cleanup unused constant and update copyright header
Subject: [x265] motion: remove mvc's sad cost calc for lowres, already measured in slicetype

details:   http://hg.videolan.org/x265/rev/f7bbb04e1992
branches:  
changeset: 10728:f7bbb04e1992
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Fri Jun 26 10:16:29 2015 +0530
description:
motion: remove mvc's sad cost calc for lowres, already measured in slicetype
Subject: [x265] rc: fixes inconsistent output in linux because of RC Lock in CQP/CRF

details:   http://hg.videolan.org/x265/rev/9feee64efa44
branches:  
changeset: 10729:9feee64efa44
user:      Aarthi Thirumalai
date:      Fri Jun 26 15:29:51 2015 +0530
description:
rc: fixes inconsistent output in linux because of RC Lock in CQP/CRF

the inconsistency is due to a race hazrd in slice context initializations when 2 Frame Encoders complete
RateControlStart in correct order but slice context initializations in the wrong order in CRF/CQP.
Subject: [x265] common: fix for multilib checked builds, move g_checkFailures within namespace

details:   http://hg.videolan.org/x265/rev/c0fc87075c75
branches:  
changeset: 10730:c0fc87075c75
user:      Steve Borho <steve at borho.org>
date:      Mon Jun 29 11:47:39 2015 -0500
description:
common: fix for multilib checked builds, move g_checkFailures within namespace

diffstat:

 build/vc10-x86_64/multilib.bat       |    4 +-
 build/vc11-x86_64/multilib.bat       |    4 +-
 build/vc12-x86_64/multilib.bat       |    4 +-
 build/vc9-x86_64/multilib.bat        |    4 +-
 source/CMakeLists.txt                |    2 +-
 source/common/common.cpp             |    4 +-
 source/common/common.h               |    2 +-
 source/common/x86/asm-primitives.cpp |   28 +
 source/common/x86/const-a.asm        |    4 +-
 source/common/x86/intrapred.h        |    1 +
 source/common/x86/intrapred8.asm     |  414 +++++++++++++++++++++++++
 source/common/x86/loopfilter.asm     |  565 +++++++++++++++++++++++++++++++++++
 source/common/x86/mc-a.asm           |  449 +++++++++++++++++++++++++++-
 source/common/x86/sad-a.asm          |    7 +-
 source/encoder/frameencoder.cpp      |   13 +
 source/encoder/motion.cpp            |   34 +-
 source/encoder/ratecontrol.cpp       |   12 -
 source/test/intrapredharness.cpp     |   44 ++
 source/test/intrapredharness.h       |    9 +
 19 files changed, 1545 insertions(+), 59 deletions(-)

diffs (truncated from 2062 to 300 lines):

diff -r b1af4c36f48a -r c0fc87075c75 build/vc10-x86_64/multilib.bat
--- a/build/vc10-x86_64/multilib.bat	Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc10-x86_64/multilib.bat	Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS100COMNTOOLS%" == "" (
 
 @cd 10bit
 if not exist x265.sln (
-  cmake  -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+  cmake  -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
 )
 if exist x265.sln (
   call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
   exit 1
 )
 if not exist x265.sln (
-  cmake  -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+  cmake  -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
 )
 if exist x265.sln (
   call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc11-x86_64/multilib.bat
--- a/build/vc11-x86_64/multilib.bat	Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc11-x86_64/multilib.bat	Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS110COMNTOOLS%" == "" (
 
 @cd 10bit
 if not exist x265.sln (
-  cmake  -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+  cmake  -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
 )
 if exist x265.sln (
   call "%VS110COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
   exit 1
 )
 if not exist x265.sln (
-  cmake  -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+  cmake  -G "Visual Studio 11 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
 )
 if exist x265.sln (
   call "%VS110COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc12-x86_64/multilib.bat
--- a/build/vc12-x86_64/multilib.bat	Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc12-x86_64/multilib.bat	Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS120COMNTOOLS%" == "" (
 
 @cd 10bit
 if not exist x265.sln (
-  cmake  -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+  cmake  -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
 )
 if exist x265.sln (
   call "%VS120COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
   exit 1
 )
 if not exist x265.sln (
-  cmake  -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+  cmake  -G "Visual Studio 12 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
 )
 if exist x265.sln (
   call "%VS120COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 build/vc9-x86_64/multilib.bat
--- a/build/vc9-x86_64/multilib.bat	Wed Jun 24 10:36:15 2015 -0500
+++ b/build/vc9-x86_64/multilib.bat	Mon Jun 29 11:47:39 2015 -0500
@@ -9,7 +9,7 @@ if "%VS90COMNTOOLS%" == "" (
 
 @cd 10bit
 if not exist x265.sln (
-  cmake  -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DWINXP_SUPPORT=ON
+  cmake  -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
 )
 if exist x265.sln (
   call "%VS90COMNTOOLS%\..\..\VC\vcvarsall.bat"
@@ -24,7 +24,7 @@ if not exist x265-static-main10.lib (
   exit 1
 )
 if not exist x265.sln (
-  cmake  -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DWINXP_SUPPORT=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
+  cmake  -G "Visual Studio 9 2008 Win64" ../../../source -DHIGH_BIT_DEPTH=OFF -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=ON -DEXTRA_LIB=x265-static-main10.lib -DEXTRA_LINK_FLAGS="/FORCE:MULTIPLE"
 )
 if exist x265.sln (
   call "%VS90COMNTOOLS%\..\..\VC\vcvarsall.bat"
diff -r b1af4c36f48a -r c0fc87075c75 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/CMakeLists.txt	Mon Jun 29 11:47:39 2015 -0500
@@ -490,7 +490,7 @@ if(NOT WIN32)
 endif()
 
 # Main CLI application
-option(ENABLE_CLI "Build standalone CLI application" ON)
+set(ENABLE_CLI ON CACHE BOOL "Build standalone CLI application")
 if(ENABLE_CLI)
     file(GLOB InputFiles input/input.cpp input/yuv.cpp input/y4m.cpp input/*.h)
     file(GLOB OutputFiles output/output.cpp output/reconplay.cpp output/*.h
diff -r b1af4c36f48a -r c0fc87075c75 source/common/common.cpp
--- a/source/common/common.cpp	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/common.cpp	Mon Jun 29 11:47:39 2015 -0500
@@ -33,12 +33,12 @@
 #include <sys/time.h>
 #endif
 
+namespace X265_NS {
+
 #if CHECKED_BUILD || _DEBUG
 int g_checkFailures;
 #endif
 
-namespace X265_NS {
-
 int64_t x265_mdate(void)
 {
 #if _WIN32
diff -r b1af4c36f48a -r c0fc87075c75 source/common/common.h
--- a/source/common/common.h	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/common.h	Mon Jun 29 11:47:39 2015 -0500
@@ -106,7 +106,7 @@
 /* If compiled with CHECKED_BUILD perform run-time checks and log any that
  * fail, both to stderr and to a file */
 #if CHECKED_BUILD || _DEBUG
-extern int g_checkFailures;
+namespace X265_NS { extern int g_checkFailures; }
 #define X265_CHECK(expr, ...) if (!(expr)) { \
     x265_log(NULL, X265_LOG_ERROR, __VA_ARGS__); \
     FILE *fp = fopen("x265_check_failures.txt", "a"); \
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp	Mon Jun 29 11:47:39 2015 -0500
@@ -1097,6 +1097,7 @@ void setupAssemblyPrimitives(EncoderPrim
         p.saoCuOrgE3[0] = PFX(saoCuOrgE3_sse4);
         p.saoCuOrgE3[1] = PFX(saoCuOrgE3_sse4);
         p.saoCuOrgB0 = PFX(saoCuOrgB0_sse4);
+        p.sign = PFX(calSign_sse4);
 
         LUMA_ADDAVG(sse4);
         CHROMA_420_ADDAVG(sse4);
@@ -1284,6 +1285,15 @@ void setupAssemblyPrimitives(EncoderPrim
     }
     if (cpuMask & X265_CPU_AVX2)
     {
+        p.saoCuOrgE0 = PFX(saoCuOrgE0_avx2);
+        p.saoCuOrgE1 = PFX(saoCuOrgE1_avx2);
+        p.saoCuOrgE1_2Rows = PFX(saoCuOrgE1_2Rows_avx2);
+        p.saoCuOrgE2[0] = PFX(saoCuOrgE2_avx2);
+        p.saoCuOrgE2[1] = PFX(saoCuOrgE2_32_avx2);
+        p.saoCuOrgE3[0] = PFX(saoCuOrgE3_avx2);
+        p.saoCuOrgE3[1] = PFX(saoCuOrgE3_32_avx2);
+        p.saoCuOrgB0 = PFX(saoCuOrgB0_avx2);
+
         p.cu[BLOCK_16x16].intra_pred[2]     = PFX(intra_pred_ang16_2_avx2);
         p.cu[BLOCK_16x16].intra_pred[3]     = PFX(intra_pred_ang16_3_avx2);
         p.cu[BLOCK_16x16].intra_pred[4]     = PFX(intra_pred_ang16_4_avx2);
@@ -1352,12 +1362,24 @@ void setupAssemblyPrimitives(EncoderPrim
         p.cu[BLOCK_32x32].intra_pred[33]    = PFX(intra_pred_ang32_33_avx2);
         p.cu[BLOCK_32x32].intra_pred[34]    = PFX(intra_pred_ang32_2_avx2);
 
+        p.pu[LUMA_12x16].pixelavg_pp = PFX(pixel_avg_12x16_avx2);
         p.pu[LUMA_16x4].pixelavg_pp = PFX(pixel_avg_16x4_avx2);
         p.pu[LUMA_16x8].pixelavg_pp = PFX(pixel_avg_16x8_avx2);
         p.pu[LUMA_16x12].pixelavg_pp = PFX(pixel_avg_16x12_avx2);
         p.pu[LUMA_16x16].pixelavg_pp = PFX(pixel_avg_16x16_avx2);
         p.pu[LUMA_16x32].pixelavg_pp = PFX(pixel_avg_16x32_avx2);
         p.pu[LUMA_16x64].pixelavg_pp = PFX(pixel_avg_16x64_avx2);
+        p.pu[LUMA_24x32].pixelavg_pp = PFX(pixel_avg_24x32_avx2);
+        p.pu[LUMA_32x8].pixelavg_pp = PFX(pixel_avg_32x8_avx2);
+        p.pu[LUMA_32x16].pixelavg_pp = PFX(pixel_avg_32x16_avx2);
+        p.pu[LUMA_32x24].pixelavg_pp = PFX(pixel_avg_32x24_avx2);
+        p.pu[LUMA_32x32].pixelavg_pp = PFX(pixel_avg_32x32_avx2);
+        p.pu[LUMA_32x64].pixelavg_pp = PFX(pixel_avg_32x64_avx2);
+        p.pu[LUMA_64x16].pixelavg_pp = PFX(pixel_avg_64x16_avx2);
+        p.pu[LUMA_64x32].pixelavg_pp = PFX(pixel_avg_64x32_avx2);
+        p.pu[LUMA_64x48].pixelavg_pp = PFX(pixel_avg_64x48_avx2);
+        p.pu[LUMA_64x64].pixelavg_pp = PFX(pixel_avg_64x64_avx2);
+        p.pu[LUMA_48x64].pixelavg_pp = PFX(pixel_avg_48x64_avx2);
 
         p.pu[LUMA_8x4].addAvg   = PFX(addAvg_8x4_avx2);
         p.pu[LUMA_8x8].addAvg   = PFX(addAvg_8x8_avx2);
@@ -1495,6 +1517,7 @@ void setupAssemblyPrimitives(EncoderPrim
         p.scale1D_128to64 = PFX(scale1D_128to64_avx2);
         p.scale2D_64to32 = PFX(scale2D_64to32_avx2);
         p.weight_pp = PFX(weight_pp_avx2);
+        p.sign = PFX(calSign_avx2);
 
         p.cu[BLOCK_16x16].calcresidual = PFX(getResidual16_avx2);
         p.cu[BLOCK_32x32].calcresidual = PFX(getResidual32_avx2);
@@ -2430,6 +2453,11 @@ void setupAssemblyPrimitives(EncoderPrim
         p.weight_pp = PFX(weight_pp_sse4);
         p.weight_sp = PFX(weight_sp_sse4);
 
+        p.cu[BLOCK_4x4].intra_filter = PFX(intra_filter_4x4_sse4);
+        p.cu[BLOCK_8x8].intra_filter = PFX(intra_filter_8x8_sse4);
+        p.cu[BLOCK_16x16].intra_filter = PFX(intra_filter_16x16_sse4);
+        p.cu[BLOCK_32x32].intra_filter = PFX(intra_filter_32x32_sse4);
+
         ALL_LUMA_TU_S(intra_pred[PLANAR_IDX], intra_pred_planar, sse4);
         ALL_LUMA_TU_S(intra_pred[DC_IDX], intra_pred_dc, sse4);
         ALL_LUMA_TU(intra_pred_allangs, all_angs_pred, sse4);
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/const-a.asm	Mon Jun 29 11:47:39 2015 -0500
@@ -41,7 +41,7 @@ const pb_15,                times 32 db 
 const pb_16,                times 32 db 16
 const pb_32,                times 32 db 32
 const pb_64,                times 32 db 64
-const pb_128,               times 16 db 128
+const pb_128,               times 32 db 128
 const pb_a1,                times 16 db 0xa1
 
 const pb_01,                times  8 db   0,   1
@@ -136,5 +136,3 @@ const popcnt_table
 db ((x>>0)&1)+((x>>1)&1)+((x>>2)&1)+((x>>3)&1)+((x>>4)&1)+((x>>5)&1)+((x>>6)&1)+((x>>7)&1)
 %assign x x+1
 %endrep
-
-const sw_64,       dd 64
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/intrapred.h	Mon Jun 29 11:47:39 2015 -0500
@@ -66,6 +66,7 @@
 
 #define DECL_ALL(cpu) \
     FUNCDEF_TU(void, all_angs_pred, cpu, pixel *dest, pixel *refPix, pixel *filtPix, int bLuma); \
+    FUNCDEF_TU(void, intra_filter, cpu, const pixel *samples, pixel *filtered); \
     DECL_ANGS(4, cpu); \
     DECL_ANGS(8, cpu); \
     DECL_ANGS(16, cpu); \
diff -r b1af4c36f48a -r c0fc87075c75 source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm	Wed Jun 24 10:36:15 2015 -0500
+++ b/source/common/x86/intrapred8.asm	Mon Jun 29 11:47:39 2015 -0500
@@ -30,6 +30,9 @@ SECTION_RODATA 32
 intra_pred_shuff_0_8:    times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
 intra_pred_shuff_15_0:   times 2 db 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
 
+intra_filter4_shuf0:  db 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13
+intra_filter4_shuf1:  db 14,15,0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13
+
 pb_0_8        times 8 db  0,  8
 pb_unpackbw1  times 2 db  1,  8,  2,  8,  3,  8,  4,  8
 pb_swap8:     times 2 db  7,  6,  5,  4,  3,  2,  1,  0
@@ -18276,3 +18279,414 @@ cglobal intra_pred_ang4_25, 3, 3, 1
 
     INTRA_PRED_STORE_4x4
     RET
+
+;-----------------------------------------------------------------------------------
+; void intra_filter_NxN(const pixel* references, pixel* filtered)
+;-----------------------------------------------------------------------------------
+INIT_XMM sse4
+cglobal intra_filter_4x4, 2,4,5
+    mov             r2b, byte [r0 +  8]             ; topLast
+    mov             r3b, byte [r0 + 16]             ; LeftLast
+
+    ; filtering top
+    pmovzxbw        m0, [r0 +  0]
+    pmovzxbw        m1, [r0 +  8]
+    pmovzxbw        m2, [r0 + 16]
+
+    pshufb          m4, m0, [intra_filter4_shuf0]   ; [6 5 4 3 2 1 0 1] samples[i - 1]
+    palignr         m3, m1, m0, 4
+    pshufb          m3, [intra_filter4_shuf1]       ; [8 7 6 5 4 3 2 9] samples[i + 1]
+
+    psllw           m0, 1
+    paddw           m4, m3
+    paddw           m0, m4
+    paddw           m0, [pw_2]
+    psrlw           m0, 2
+
+    ; filtering left
+    palignr         m4, m1, m1, 14                  ; [14 13 12 11 10 9 8 15] samples[i - 1]
+    pinsrb          m4, [r0], 2                     ; [14 13 12 11 10 9 0 15] samples[i + 1]
+    palignr         m3, m2, m1, 4
+    pshufb          m3, [intra_filter4_shuf1]
+
+    psllw           m1, 1
+    paddw           m4, m3
+    paddw           m1, m4
+    paddw           m1, [pw_2]
+    psrlw           m1, 2
+    packuswb        m0, m1
+
+    movu            [r1], m0
+    mov             [r1 +  8], r2b                  ; topLast
+    mov             [r1 + 16], r3b                  ; LeftLast
+    RET
+
+INIT_XMM sse4
+cglobal intra_filter_8x8, 2,4,6
+    mov             r2b, byte [r0 + 16]             ; topLast
+    mov             r3b, byte [r0 + 32]             ; LeftLast
+
+    ; filtering top
+    pmovzxbw        m0, [r0 +  0]
+    pmovzxbw        m1, [r0 +  8]
+    pmovzxbw        m2, [r0 + 16]
+
+    pshufb          m4, m0, [intra_filter4_shuf0]   ; [6 5 4 3 2 1 0 1] samples[i - 1]
+    palignr         m5, m1, m0, 2
+    pinsrb          m5, [r0 + 17], 0                ; [8 7 6 5 4 3 2 9] samples[i + 1]
+


More information about the x265-commits mailing list