[x265-commits] [x265] search: disable psyEnergy checks
Deepthi Nandakumar
deepthi at multicorewareinc.com
Wed Mar 25 18:50:07 CET 2015
details: http://hg.videolan.org/x265/rev/30e713269c6f
branches:
changeset: 9879:30e713269c6f
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Mar 25 16:26:34 2015 +0530
description:
search: disable psyEnergy checks
psyEnergy may never be calculated due to user option, or may be turned off due
to high QP.
Subject: [x265] rc: nit
details: http://hg.videolan.org/x265/rev/e81e94e4748b
branches:
changeset: 9880:e81e94e4748b
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 17:15:13 2015 -0500
description:
rc: nit
Subject: [x265] slicetype: do not re-calculate AQ cost of B-frames
details: http://hg.videolan.org/x265/rev/56b3b10c3d91
branches:
changeset: 9881:56b3b10c3d91
user: Steve Borho <steve at borho.org>
date: Tue Mar 24 17:15:33 2015 -0500
description:
slicetype: do not re-calculate AQ cost of B-frames
Subject: [x265] cmake: use CMAKE_CURRENT_SOURCE_DIR as location to run hg commands find a version
details: http://hg.videolan.org/x265/rev/ed765141045e
branches:
changeset: 9882:ed765141045e
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Wed Mar 25 13:58:23 2015 +0530
description:
cmake: use CMAKE_CURRENT_SOURCE_DIR as location to run hg commands find a version
Subject: [x265] asm: chroma_hps[8x2, 8x4, 8x6, 8x16, 8x32] avx2 - improved 298c->228c, 397c->312c, 455c->348c, 936c->736c, 1696c->1319c
details: http://hg.videolan.org/x265/rev/8c6a2e789ee4
branches:
changeset: 9883:8c6a2e789ee4
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 25 09:42:42 2015 +0530
description:
asm: chroma_hps[8x2, 8x4, 8x6, 8x16, 8x32] avx2 - improved 298c->228c, 397c->312c, 455c->348c, 936c->736c, 1696c->1319c
Subject: [x265] asm: chroma_hps[2x4] avx2 - improved 348c->274c
details: http://hg.videolan.org/x265/rev/1df2a453e3f0
branches:
changeset: 9884:1df2a453e3f0
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 25 09:50:37 2015 +0530
description:
asm: chroma_hps[2x4] avx2 - improved 348c->274c
Subject: [x265] asm: chroma_hps[2x8] avx2 - improved 502c->378c
details: http://hg.videolan.org/x265/rev/b38b374cc95d
branches:
changeset: 9885:b38b374cc95d
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 25 09:54:04 2015 +0530
description:
asm: chroma_hps[2x8] avx2 - improved 502c->378c
Subject: [x265] asm: chroma_hpp[12x16] avx2 - improved 1483c->1176c
details: http://hg.videolan.org/x265/rev/8ee8385f8e60
branches:
changeset: 9886:8ee8385f8e60
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 25 09:58:31 2015 +0530
description:
asm: chroma_hpp[12x16] avx2 - improved 1483c->1176c
Subject: [x265] asm: chroma_hpp[24x32] avx2 - improved 4525c->4289c
details: http://hg.videolan.org/x265/rev/05a8946304c7
branches:
changeset: 9887:05a8946304c7
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 25 10:04:57 2015 +0530
description:
asm: chroma_hpp[24x32] avx2 - improved 4525c->4289c
Subject: [x265] asm: avx2 code sse_pp[32x32] and sse_pp[64x64] for 8 bpp
details: http://hg.videolan.org/x265/rev/d9659e8f148f
branches:
changeset: 9888:d9659e8f148f
user: Sumalatha Polureddy
date: Wed Mar 25 10:49:21 2015 +0530
description:
asm: avx2 code sse_pp[32x32] and sse_pp[64x64] for 8 bpp
sse3
sse_pp[32x32] 6.39x 2497.86 15957.98
sse_pp[64x64] 5.01x 12520.95 62749.02
avx2
sse_pp[32x32] 13.02x 1246.36 16225.92
sse_pp[64x64] 11.79x 5189.50 61170.29
Subject: [x265] asm: avx2 code for add_ps for chroma sizes 16x16, 32x32, reused the code from luma
details: http://hg.videolan.org/x265/rev/06623a15cbe9
branches:
changeset: 9889:06623a15cbe9
user: Sumalatha Polureddy
date: Wed Mar 25 10:50:35 2015 +0530
description:
asm: avx2 code for add_ps for chroma sizes 16x16, 32x32, reused the code from luma
sse3
[i420] add_ps[16x16] 17.39x 625.09 10867.35
[i420] add_ps[32x32] 21.70x 1978.74 42930.85
avx2
[i420] add_ps[16x16] 21.19x 482.93 10234.38
[i420] add_ps[32x32] 29.58x 1442.61 42678.27
Subject: [x265] asm: avx2 code for sub_ps for chroma sizes 16x16, 32x32, reused the code from luma
details: http://hg.videolan.org/x265/rev/c4076ce59807
branches:
changeset: 9890:c4076ce59807
user: Sumalatha Polureddy
date: Wed Mar 25 10:51:38 2015 +0530
description:
asm: avx2 code for sub_ps for chroma sizes 16x16, 32x32, reused the code from luma
sse3
[i420] sub_ps[16x16] 5.27x 719.40 3788.99
[i420] sub_ps[32x32] 5.39x 2605.93 14054.38
avx2
[i420] sub_ps[16x16] 7.88x 480.04 3785.06
[i420] sub_ps[32x32] 10.14x 1386.92 14063.74
Subject: [x265] asm: call avx code for copy_ss[32x32] and chroma copy_ss[32x32]
details: http://hg.videolan.org/x265/rev/689cb5982a47
branches:
changeset: 9891:689cb5982a47
user: Sumalatha Polureddy
date: Wed Mar 25 10:53:19 2015 +0530
description:
asm: call avx code for copy_ss[32x32] and chroma copy_ss[32x32]
sse3
copy_ss[32x32] 8.83x 1258.84 11120.56
[i420] copy_ss[32x32] 7.86x 1417.72 11147.99
avx
copy_ss[32x32] 16.77x 664.79 11150.48
[i420] copy_ss[32x32] 14.60x 748.45 10926.18
Subject: [x265] asm: avx code for satd for chroma sizes 420, reused the luma code
details: http://hg.videolan.org/x265/rev/13bb9947e984
branches:
changeset: 9892:13bb9947e984
user: Sumalatha Polureddy
date: Wed Mar 25 10:58:06 2015 +0530
description:
asm: avx code for satd for chroma sizes 420, reused the luma code
Subject: [x265] asm: avx2 code for see_pp for chroma 16x16, 32x32, reused luma code
details: http://hg.videolan.org/x265/rev/802be4b39bbb
branches:
changeset: 9893:802be4b39bbb
user: Sumalatha Polureddy
date: Wed Mar 25 12:41:25 2015 +0530
description:
asm: avx2 code for see_pp for chroma 16x16, 32x32, reused luma code
Subject: [x265] asm: psyCost_ss avx2 code for all sizes(4x4,8x8,16x16,32x32,64x64)
details: http://hg.videolan.org/x265/rev/90102dc64266
branches:
changeset: 9894:90102dc64266
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Wed Mar 25 12:50:38 2015 +0530
description:
asm: psyCost_ss avx2 code for all sizes(4x4,8x8,16x16,32x32,64x64)
AVX2:
psy_cost_ss[4x4] 6.53x 336.42 2195.55
psy_cost_ss[8x8] 6.10x 1422.97 8678.92
psy_cost_ss[16x16] 6.23x 5639.05 35154.69
psy_cost_ss[32x32] 6.19x 23208.20 143647.30
psy_cost_ss[64x64] 6.28x 89826.32 564206.44
SSE4:
psy_cost_ss[4x4] 4.52x 514.43 2322.86
psy_cost_ss[8x8] 3.48x 2579.79 8978.54
psy_cost_ss[16x16] 3.52x 10234.08 36056.70
psy_cost_ss[32x32] 3.46x 44220.05 152957.89
psy_cost_ss[64x64] 3.49x 159862.55 557929.25
Subject: [x265] asm: intra_pred_ang32_27 improved by ~61%, 3402.39c -> 1322.11c over SSE4
details: http://hg.videolan.org/x265/rev/74efde67a125
branches:
changeset: 9895:74efde67a125
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:05:06 2015 +0530
description:
asm: intra_pred_ang32_27 improved by ~61%, 3402.39c -> 1322.11c over SSE4
Subject: [x265] asm: intra_pred_ang32_28 improved by ~56%, 3234.68c -> 1421.76c over SSE4
details: http://hg.videolan.org/x265/rev/e6ea2f01dc40
branches:
changeset: 9896:e6ea2f01dc40
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:21:04 2015 +0530
description:
asm: intra_pred_ang32_28 improved by ~56%, 3234.68c -> 1421.76c over SSE4
Subject: [x265] asm: intra_pred_ang32_29 improved by ~58%, 3763.98c -> 1562.61c over SSE4
details: http://hg.videolan.org/x265/rev/5793162a744e
branches:
changeset: 9897:5793162a744e
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:27:40 2015 +0530
description:
asm: intra_pred_ang32_29 improved by ~58%, 3763.98c -> 1562.61c over SSE4
Subject: [x265] asm: intra_pred_ang32_30 improved by ~57% over SSE4 asm
details: http://hg.videolan.org/x265/rev/0918c18d40ce
branches:
changeset: 9898:0918c18d40ce
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:33:08 2015 +0530
description:
asm: intra_pred_ang32_30 improved by ~57% over SSE4 asm
AVX2:
intra_ang_32x32[30] 17.53x 1643.33 28814.21
SSE4:
intra_ang_32x32[30] 8.02x 3832.86 30729.35
Subject: [x265] asm: intra_pred_ang32_31 improved by ~54% over SSE4
details: http://hg.videolan.org/x265/rev/58872465c1bf
branches:
changeset: 9899:58872465c1bf
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:37:45 2015 +0530
description:
asm: intra_pred_ang32_31 improved by ~54% over SSE4
AVX2:
intra_ang_32x32[31] 16.42x 1810.50 29723.12
SSE4:
intra_ang_32x32[32] 7.66x 4017.17 30782.31
Subject: [x265] asm: intra_pred_ang32_32 improved by ~46% over SSE4
details: http://hg.videolan.org/x265/rev/ddaf13664302
branches:
changeset: 9900:ddaf13664302
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Mar 25 16:43:20 2015 +0530
description:
asm: intra_pred_ang32_32 improved by ~46% over SSE4
AVX2:
intra_ang_32x32[32] 14.49x 2082.67 30185.36
SSE4:
intra_ang_32x32[32] 7.81x 3898.58 30442.13
Subject: [x265] analysis: initialize merge costs at RD levels 0..4
details: http://hg.videolan.org/x265/rev/aa548155149b
branches:
changeset: 9901:aa548155149b
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 12:08:12 2015 -0500
description:
analysis: initialize merge costs at RD levels 0..4
prevents check failures when psy-rd gets disabled
Subject: [x265] search: use pre-calculated size index for psy-energy (nit)
details: http://hg.videolan.org/x265/rev/e4ac575fc2f9
branches:
changeset: 9902:e4ac575fc2f9
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 12:08:47 2015 -0500
description:
search: use pre-calculated size index for psy-energy (nit)
Subject: [x265] Backed out changeset: 30e713269c6f
details: http://hg.videolan.org/x265/rev/24fdb661bb57
branches:
changeset: 9903:24fdb661bb57
user: Steve Borho <steve at borho.org>
date: Wed Mar 25 12:49:01 2015 -0500
description:
Backed out changeset: 30e713269c6f
the point of the check is to discover fields that are uninitialized.
aa548155149b was the correct fix for this problem
diffstat:
source/cmake/version.cmake | 18 +-
source/common/x86/asm-primitives.cpp | 52 +
source/common/x86/const-a.asm | 2 +-
source/common/x86/intrapred.h | 6 +
source/common/x86/intrapred8.asm | 1864 ++++++++++++++++++++++++++++++++++
source/common/x86/ipfilter8.asm | 339 ++++++
source/common/x86/pixel-a.asm | 534 +++++++++-
source/common/x86/pixel.h | 6 +
source/common/x86/ssd-a.asm | 2 +
source/encoder/analysis.cpp | 3 +
source/encoder/ratecontrol.cpp | 2 +-
source/encoder/search.cpp | 4 +-
source/encoder/slicetype.cpp | 5 +-
13 files changed, 2822 insertions(+), 15 deletions(-)
diffs (truncated from 3109 to 300 lines):
diff -r e637273e2ae6 -r 24fdb661bb57 source/cmake/version.cmake
--- a/source/cmake/version.cmake Tue Mar 24 15:31:05 2015 -0500
+++ b/source/cmake/version.cmake Wed Mar 25 12:49:01 2015 -0500
@@ -10,9 +10,9 @@ set(X265_VERSION "unknown")
set(X265_LATEST_TAG "0.0")
set(X265_TAG_DISTANCE "0")
-if(EXISTS ${CMAKE_SOURCE_DIR}/../.hg_archival.txt)
+if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../.hg_archival.txt)
# read the lines of the archive summary file to extract the version
- file(READ ${CMAKE_SOURCE_DIR}/../.hg_archival.txt archive)
+ file(READ ${CMAKE_CURRENT_SOURCE_DIR}/../.hg_archival.txt archive)
STRING(REGEX REPLACE "\n" ";" archive "${archive}")
foreach(f ${archive})
string(FIND "${f}" ": " pos)
@@ -29,7 +29,7 @@ if(EXISTS ${CMAKE_SOURCE_DIR}/../.hg_arc
string(SUBSTRING "${hg_node}" 0 16 hg_id)
set(X265_VERSION "${hg_latesttag}+${hg_latesttagdistance}-${hg_id}")
endif()
-elseif(HG_EXECUTABLE AND EXISTS ${CMAKE_SOURCE_DIR}/../.hg)
+elseif(HG_EXECUTABLE AND EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../.hg)
if(EXISTS "${HG_EXECUTABLE}.bat")
# mercurial source installs on Windows require .bat extension
set(HG_EXECUTABLE "${HG_EXECUTABLE}.bat")
@@ -38,14 +38,14 @@ elseif(HG_EXECUTABLE AND EXISTS ${CMAKE_
execute_process(COMMAND
${HG_EXECUTABLE} log -r. --template "{latesttag}"
- WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+ WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE X265_LATEST_TAG
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
)
execute_process(COMMAND
${HG_EXECUTABLE} log -r. --template "{latesttagdistance}"
- WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+ WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE X265_TAG_DISTANCE
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
@@ -53,7 +53,7 @@ elseif(HG_EXECUTABLE AND EXISTS ${CMAKE_
execute_process(
COMMAND
${HG_EXECUTABLE} log -r. --template "{node|short}"
- WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+ WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE HG_REVISION_ID
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
@@ -67,11 +67,11 @@ elseif(HG_EXECUTABLE AND EXISTS ${CMAKE_
else()
set(X265_VERSION "${X265_LATEST_TAG}+${X265_TAG_DISTANCE}-${HG_REVISION_ID}")
endif()
-elseif(GIT_EXECUTABLE AND EXISTS ${CMAKE_SOURCE_DIR}/../.git)
+elseif(GIT_EXECUTABLE AND EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../.git)
execute_process(
COMMAND
${GIT_EXECUTABLE} describe --tags --abbrev=0
- WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+ WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE X265_LATEST_TAG
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
@@ -80,7 +80,7 @@ elseif(GIT_EXECUTABLE AND EXISTS ${CMAKE
execute_process(
COMMAND
${GIT_EXECUTABLE} describe --tags
- WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
+ WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE X265_VERSION
ERROR_QUIET
OUTPUT_STRIP_TRAILING_WHITESPACE
diff -r e637273e2ae6 -r 24fdb661bb57 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Tue Mar 24 15:31:05 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Wed Mar 25 12:49:01 2015 -0500
@@ -1364,8 +1364,27 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].satd = x265_pixel_satd_12x32_avx;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].satd = x265_pixel_satd_4x32_avx;
ALL_LUMA_PU(satd, pixel_satd, avx);
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].satd = x265_pixel_satd_4x4_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].satd = x265_pixel_satd_8x8_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].satd = x265_pixel_satd_16x16_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].satd = x265_pixel_satd_32x32_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].satd = x265_pixel_satd_8x4_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].satd = x265_pixel_satd_4x8_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].satd = x265_pixel_satd_16x8_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].satd = x265_pixel_satd_8x16_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].satd = x265_pixel_satd_32x16_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].satd = x265_pixel_satd_16x32_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].satd = x265_pixel_satd_16x12_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].satd = x265_pixel_satd_12x16_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].satd = x265_pixel_satd_16x4_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].satd = x265_pixel_satd_4x16_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].satd = x265_pixel_satd_32x24_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].satd = x265_pixel_satd_24x32_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].satd = x265_pixel_satd_32x8_avx;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].satd = x265_pixel_satd_8x32_avx;
ASSIGN_SA8D(avx);
ASSIGN_SSE_PP(avx);
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_8x8].sse_pp = x265_pixel_ssd_8x8_avx;
ASSIGN_SSE_SS(avx);
LUMA_VAR(avx);
@@ -1381,8 +1400,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.ssim_end_4 = x265_pixel_ssim_end4_avx;
p.cu[BLOCK_16x16].copy_ss = x265_blockcopy_ss_16x16_avx;
+ p.cu[BLOCK_32x32].copy_ss = x265_blockcopy_ss_32x32_avx;
p.cu[BLOCK_64x64].copy_ss = x265_blockcopy_ss_64x64_avx;
p.chroma[X265_CSP_I420].cu[CHROMA_420_16x16].copy_ss = x265_blockcopy_ss_16x16_avx;
+ p.chroma[X265_CSP_I420].cu[CHROMA_420_32x32].copy_ss = x265_blockcopy_ss_32x32_avx;
p.chroma[X265_CSP_I422].cu[CHROMA_422_16x32].copy_ss = x265_blockcopy_ss_16x32_avx;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].copy_pp = x265_blockcopy_pp_32x8_avx;
@@ -1426,6 +1447,12 @@ void setupAssemblyPrimitives(EncoderPrim
#if X86_64
if (cpuMask & X265_CPU_AVX2)
{
+ p.cu[BLOCK_4x4].psy_cost_ss = x265_psyCost_ss_4x4_avx2;
+ p.cu[BLOCK_8x8].psy_cost_ss = x265_psyCost_ss_8x8_avx2;
+ p.cu[BLOCK_16x16].psy_cost_ss = x265_psyCost_ss_16x16_avx2;
+ p.cu[BLOCK_32x32].psy_cost_ss = x265_psyCost_ss_32x32_avx2;
+ p.cu[BLOCK_64x64].psy_cost_ss = x265_psyCost_ss_64x64_avx2;
+
p.cu[BLOCK_4x4].psy_cost_pp = x265_psyCost_pp_4x4_avx2;
p.cu[BLOCK_8x8].psy_cost_pp = x265_psyCost_pp_8x8_avx2;
p.cu[BLOCK_16x16].psy_cost_pp = x265_psyCost_pp_16x16_avx2;
@@ -1484,10 +1511,14 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_16x16].add_ps = x265_pixel_add_ps_16x16_avx2;
p.cu[BLOCK_32x32].add_ps = x265_pixel_add_ps_32x32_avx2;
p.cu[BLOCK_64x64].add_ps = x265_pixel_add_ps_64x64_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].add_ps = x265_pixel_add_ps_16x16_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].add_ps = x265_pixel_add_ps_32x32_avx2;
p.cu[BLOCK_16x16].sub_ps = x265_pixel_sub_ps_16x16_avx2;
p.cu[BLOCK_32x32].sub_ps = x265_pixel_sub_ps_32x32_avx2;
p.cu[BLOCK_64x64].sub_ps = x265_pixel_sub_ps_64x64_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].sub_ps = x265_pixel_sub_ps_16x16_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].sub_ps = x265_pixel_sub_ps_32x32_avx2;
p.pu[LUMA_16x4].pixelavg_pp = x265_pixel_avg_16x4_avx2;
p.pu[LUMA_16x8].pixelavg_pp = x265_pixel_avg_16x8_avx2;
@@ -1534,6 +1565,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_16x32].sad_x4 = x265_pixel_sad_x4_16x32_avx2;
p.cu[BLOCK_16x16].sse_pp = x265_pixel_ssd_16x16_avx2;
+ p.cu[BLOCK_32x32].sse_pp = x265_pixel_ssd_32x32_avx2;
+ p.cu[BLOCK_64x64].sse_pp = x265_pixel_ssd_64x64_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].sse_pp = x265_pixel_ssd_16x16_avx2;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].sse_pp = x265_pixel_ssd_32x32_avx2;
p.cu[BLOCK_16x16].ssd_s = x265_pixel_ssd_s_16_avx2;
p.cu[BLOCK_32x32].ssd_s = x265_pixel_ssd_s_32_avx2;
@@ -1601,6 +1636,12 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_32x32].intra_pred[34] = x265_intra_pred_ang32_34_avx2;
p.cu[BLOCK_32x32].intra_pred[2] = x265_intra_pred_ang32_2_avx2;
p.cu[BLOCK_32x32].intra_pred[26] = x265_intra_pred_ang32_26_avx2;
+ p.cu[BLOCK_32x32].intra_pred[27] = x265_intra_pred_ang32_27_avx2;
+ p.cu[BLOCK_32x32].intra_pred[28] = x265_intra_pred_ang32_28_avx2;
+ p.cu[BLOCK_32x32].intra_pred[29] = x265_intra_pred_ang32_29_avx2;
+ p.cu[BLOCK_32x32].intra_pred[30] = x265_intra_pred_ang32_30_avx2;
+ p.cu[BLOCK_32x32].intra_pred[31] = x265_intra_pred_ang32_31_avx2;
+ p.cu[BLOCK_32x32].intra_pred[32] = x265_intra_pred_ang32_32_avx2;
// copy_sp primitives
p.cu[BLOCK_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
@@ -1714,6 +1755,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].filter_hpp = x265_interp_4tap_horiz_pp_12x16_avx2;
+
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
@@ -1723,6 +1766,12 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].filter_hps = x265_interp_4tap_horiz_ps_8x2_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_hps = x265_interp_4tap_horiz_ps_8x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_hps = x265_interp_4tap_horiz_ps_8x6_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_hps = x265_interp_4tap_horiz_ps_8x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_hps = x265_interp_4tap_horiz_ps_8x16_avx2;
+
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].filter_hps = x265_interp_4tap_horiz_ps_16x12_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
@@ -1731,6 +1780,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_hps = x265_interp_4tap_horiz_ps_32x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].filter_hps = x265_interp_4tap_horiz_ps_32x24_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_hps = x265_interp_4tap_horiz_ps_32x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_hps = x265_interp_4tap_horiz_ps_2x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].filter_hps = x265_interp_4tap_horiz_ps_2x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_hpp = x265_interp_4tap_horiz_pp_24x32_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
diff -r e637273e2ae6 -r 24fdb661bb57 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Tue Mar 24 15:31:05 2015 -0500
+++ b/source/common/x86/const-a.asm Wed Mar 25 12:49:01 2015 -0500
@@ -83,7 +83,7 @@ const pw_ppmmppmm, dw 1,1,-1,-1,1,1,-1,-
const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-1
const pw_pmmpzzzz, dw 1,-1,-1,1,0,0,0,0
const pd_1, times 8 dd 1
-const pd_2, times 4 dd 2
+const pd_2, times 8 dd 2
const pd_4, times 4 dd 4
const pd_8, times 4 dd 8
const pd_16, times 4 dd 16
diff -r e637273e2ae6 -r 24fdb661bb57 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Tue Mar 24 15:31:05 2015 -0500
+++ b/source/common/x86/intrapred.h Wed Mar 25 12:49:01 2015 -0500
@@ -206,6 +206,12 @@ void x265_intra_pred_ang16_22_avx2(pixel
void x265_intra_pred_ang32_34_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang32_2_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang32_26_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_27_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_28_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_29_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_30_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_31_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang32_32_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_all_angs_pred_4x4_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_8x8_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_16x16_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
diff -r e637273e2ae6 -r 24fdb661bb57 source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm Tue Mar 24 15:31:05 2015 -0500
+++ b/source/common/x86/intrapred8.asm Wed Mar 25 12:49:01 2015 -0500
@@ -247,6 +247,135 @@ c_ang16_mode_22: db 13, 19, 13, 19,
db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
db 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+
+ALIGN 32
+c_ang32_mode_27: db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+ db 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
+ db 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
+ db 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
+ db 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
+
+
+ALIGN 32
+c_ang32_mode_28: db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
+ db 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
+ db 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
+ db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
+ db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
+ db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+ db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
+ db 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7
+ db 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
+
+ALIGN 32
+c_ang32_mode_29: db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
+ db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
+ db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
+ db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+ db 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
+ db 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
+ db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
+ db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
+
+
+ALIGN 32
+c_ang32_mode_30: db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
+ db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
More information about the x265-commits
mailing list