[x265-commits] [x265] entropy: add check failure
Deepthi Nandakumar
deepthi at multicorewareinc.com
Fri Mar 13 18:38:43 CET 2015
details: http://hg.videolan.org/x265/rev/8f4fa5a24dac
branches:
changeset: 9700:8f4fa5a24dac
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Mar 13 09:58:48 2015 +0530
description:
entropy: add check failure
Subject: [x265] asm: avx2 code for filter_vpp[16x4], filter_vps[16x4]: 303c->293c, 311c->253c
details: http://hg.videolan.org/x265/rev/4029ab5bfc0f
branches:
changeset: 9701:4029ab5bfc0f
user: Divya Manivannan <divya at multicorewareinc.com>
date: Fri Mar 13 09:52:41 2015 +0530
description:
asm: avx2 code for filter_vpp[16x4], filter_vps[16x4]: 303c->293c, 311c->253c
Subject: [x265] asm: improve ~5% on AVX2 interp_8tap_horiz_ps_4xN
details: http://hg.videolan.org/x265/rev/9d680635d58c
branches:
changeset: 9702:9d680635d58c
user: Min Chen <chenm003 at 163.com>
date: Thu Mar 12 20:46:46 2015 -0700
description:
asm: improve ~5% on AVX2 interp_8tap_horiz_ps_4xN
Subject: [x265] asm-intra_pred_ang8_11: improved, 317.84c -> 230.29c over SSE4 asm code
details: http://hg.videolan.org/x265/rev/cfa8729f7d17
branches:
changeset: 9703:cfa8729f7d17
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Thu Mar 12 18:40:23 2015 +0530
description:
asm-intra_pred_ang8_11: improved, 317.84c -> 230.29c over SSE4 asm code
AVX2:
intra_ang_8x8[11] 14.15x 230.29 3259.04
SSE4:
intra_ang_8x8[11] 10.25x 317.84 3258.71
Subject: [x265] asm-intra_pred_ang16_25: improved, 781.13c -> 466.16c
details: http://hg.videolan.org/x265/rev/e99b176c0a61
branches:
changeset: 9704:e99b176c0a61
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Thu Mar 12 19:08:34 2015 +0530
description:
asm-intra_pred_ang16_25: improved, 781.13c -> 466.16c
AVX2:
intra_ang_16x16[25] 18.09x 466.16 8434.26
SSE4:
intra_ang_16x16[25] 10.90x 781.13 8511.65
Subject: [x265] asm: filter_vsp[4x4], filter_vss[4x4] in avx2: 407c->198c, 361c->180c
details: http://hg.videolan.org/x265/rev/9b873ad208ae
branches:
changeset: 9705:9b873ad208ae
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Mar 12 11:32:16 2015 +0530
description:
asm: filter_vsp[4x4], filter_vss[4x4] in avx2: 407c->198c, 361c->180c
Subject: [x265] asm: filter_vsp[8x8], filter_vss[8x8] in avx2: 887c->525c, 828c->524c
details: http://hg.videolan.org/x265/rev/e2279723215a
branches:
changeset: 9706:e2279723215a
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Mar 12 14:02:18 2015 +0530
description:
asm: filter_vsp[8x8], filter_vss[8x8] in avx2: 887c->525c, 828c->524c
Subject: [x265] asm: filter_vsp[16x16, 32x16], filter_vss[16x16, 32x16]: 3042c->1875c, 5844c->3724c, 2646c->1988c, 4655c->4040c
details: http://hg.videolan.org/x265/rev/eb89c6d5e259
branches:
changeset: 9707:eb89c6d5e259
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Mar 12 16:55:03 2015 +0530
description:
asm: filter_vsp[16x16, 32x16], filter_vss[16x16, 32x16]: 3042c->1875c, 5844c->3724c, 2646c->1988c, 4655c->4040c
Subject: [x265] asm: filter_vsp[16x32, 24x32, 32x32], filter_vss[16x32, 24x32, 32x32] in avx2
details: http://hg.videolan.org/x265/rev/56fcfb6e49ac
branches:
changeset: 9708:56fcfb6e49ac
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu Mar 12 17:31:45 2015 +0530
description:
asm: filter_vsp[16x32, 24x32, 32x32], filter_vss[16x32, 24x32, 32x32] in avx2
filter_vsp[16x32, 24x32, 32x32]: 6015c->3693c, 8710c->5692c, 11284c->7731c
filter_vss[16x32, 24x32, 32x32]: 4702c->4024c, 7013c->6132c, 9046c->7926c
Subject: [x265] encoder: set frame thread count correctly when no-wpp is enabled
details: http://hg.videolan.org/x265/rev/6d05b68034cd
branches:
changeset: 9709:6d05b68034cd
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Mar 13 11:52:43 2015 +0530
description:
encoder: set frame thread count correctly when no-wpp is enabled
Subject: [x265] asm : chroma_hps[4x2] for i420 avx2 - improved 329c->233c
details: http://hg.videolan.org/x265/rev/0cb840697d81
branches:
changeset: 9710:0cb840697d81
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 11 16:45:15 2015 +0530
description:
asm : chroma_hps[4x2] for i420 avx2 - improved 329c->233c
Subject: [x265] asm : chroma_hps[4x8 , 4x16] for i420 avx2 - improved 557c->377c, 873c->588c
details: http://hg.videolan.org/x265/rev/b828ac0ca6b9
branches:
changeset: 9711:b828ac0ca6b9
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Wed Mar 11 17:20:32 2015 +0530
description:
asm : chroma_hps[4x8 , 4x16] for i420 avx2 - improved 557c->377c, 873c->588c
Subject: [x265] asm: chroma_hps[16x4, 16x8, 16x12, 16x32] for i420 avx2 - improved 743c->468c, 1065c->681c, 1399c->894c, 2961c->1844c
details: http://hg.videolan.org/x265/rev/9254dc264ba3
branches:
changeset: 9712:9254dc264ba3
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Thu Mar 12 13:55:24 2015 +0530
description:
asm: chroma_hps[16x4, 16x8, 16x12, 16x32] for i420 avx2 - improved 743c->468c, 1065c->681c, 1399c->894c, 2961c->1844c
Subject: [x265] asm: chroma_hps[32x8, 32x16, 32x24] for i420 avx2 - improved 1843c->1210c, 3149c->2001c, 4440c->2906c
details: http://hg.videolan.org/x265/rev/c6fc1bf05a0d
branches:
changeset: 9713:c6fc1bf05a0d
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Thu Mar 12 14:00:13 2015 +0530
description:
asm: chroma_hps[32x8, 32x16, 32x24] for i420 avx2 - improved 1843c->1210c, 3149c->2001c, 4440c->2906c
Subject: [x265] rc: recompute planned frame size when using vbv with --qpfile
details: http://hg.videolan.org/x265/rev/62c1052a7c95
branches:
changeset: 9714:62c1052a7c95
user: Aarthi Thirumalai
date: Wed Mar 11 23:05:55 2015 +0530
description:
rc: recompute planned frame size when using vbv with --qpfile
Subject: [x265] rc: clip qp after initial vbv-lookahead estimation.
details: http://hg.videolan.org/x265/rev/e0f834c69cb2
branches:
changeset: 9715:e0f834c69cb2
user: Aarthi Thirumalai
date: Wed Mar 11 23:26:27 2015 +0530
description:
rc: clip qp after initial vbv-lookahead estimation.
avoid drastic qp changes caused due to possible mispredictions from initial vbv predictors.
Subject: [x265] presets[OUTPUT CHANGE]: change superfast and ultrafast presets
details: http://hg.videolan.org/x265/rev/c1a8eef8be14
branches:
changeset: 9716:c1a8eef8be14
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Mar 13 16:40:06 2015 +0530
description:
presets[OUTPUT CHANGE]: change superfast and ultrafast presets
Better R-D curves for superfast and ultrafast and introducing minCUSize in ultrafast.
Subject: [x265] encoder: set frame thread count correctly when no-wpp is enabled
details: http://hg.videolan.org/x265/rev/e4634aa7fdbe
branches: stable
changeset: 9717:e4634aa7fdbe
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Mar 13 11:52:43 2015 +0530
description:
encoder: set frame thread count correctly when no-wpp is enabled
Subject: [x265] analysis: add logic for calculate qp for a given cu size
details: http://hg.videolan.org/x265/rev/8dc54fd66706
branches:
changeset: 9718:8dc54fd66706
user: Sreelakshmy V G <sreelakshmy at multicorewareinc.com>
date: Fri Mar 13 09:06:39 2015 +0530
description:
analysis: add logic for calculate qp for a given cu size
A new function calculateQpforCuSize() is added to calculate the qp for any
given cu.
Subject: [x265] rc: fix bug in CRF caused by e0f834c69cb2
details: http://hg.videolan.org/x265/rev/1bed2e325efc
branches:
changeset: 9719:1bed2e325efc
user: Aarthi Thirumalai
date: Fri Mar 13 21:57:56 2015 +0530
description:
rc: fix bug in CRF caused by e0f834c69cb2
Subject: [x265] asm: intra pred planar32 sse2
details: http://hg.videolan.org/x265/rev/5e7519f25ad5
branches:
changeset: 9720:5e7519f25ad5
user: David T Yuen <dtyx265 at gmail.com>
date: Thu Mar 12 19:19:24 2015 -0700
description:
asm: intra pred planar32 sse2
This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar32 sse4
There are essentially are two versions here.
One for x86_64 and one for x86_32. It would have been too ugly
to conditionally code the differences in a single primitive.
64-bit
./test/TestBench --testbench intrapred | grep intra_planar_32x32
intra_planar_32x32 10.92x 11107.49 121282.19
32-bit
./test/TestBench --testbench intrapred | grep intra_planar_32x32
intra_planar_32x32 10.01x 9918.94 99315.12
Subject: [x265] asm: intra pred planar32 sse2 high bit
details: http://hg.videolan.org/x265/rev/22ff3e631f65
branches:
changeset: 9721:22ff3e631f65
user: David T Yuen <dtyx265 at gmail.com>
date: Thu Mar 12 19:24:11 2015 -0700
description:
asm: intra pred planar32 sse2 high bit
This replaces c code for systems using ssse3 to sse2 processors
The code is backported from intrapred planar32 sse4
Unlike the sse4 high bit version which operates with 32-bit values
this version uses 16-bits to calculate the prediction. 16-bits
is just enough for 10 bit luma depth, or at least it's enough for
the testbench. Anything more and this primitive will overflow.
./test/TestBench --testbench intrapred | grep intra_planar_32x32
intra_planar_32x32 11.06x 10337.52 114374.60
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/5ebd5d7c0a76
branches:
changeset: 9722:5ebd5d7c0a76
user: Steve Borho <steve at borho.org>
date: Fri Mar 13 12:16:47 2015 -0500
description:
Merge with stable
diffstat:
doc/reST/presets.rst | 8 +-
source/common/param.cpp | 7 +-
source/common/x86/asm-primitives.cpp | 35 +
source/common/x86/intrapred.h | 3 +
source/common/x86/intrapred16.asm | 115 +++
source/common/x86/intrapred8.asm | 319 +++++++++
source/common/x86/ipfilter8.asm | 1103 +++++++++++++++++++++++++++++++++-
source/common/x86/ipfilter8.h | 3 +
source/encoder/analysis.cpp | 35 +
source/encoder/analysis.h | 2 +
source/encoder/encoder.cpp | 2 +-
source/encoder/entropy.cpp | 1 +
source/encoder/ratecontrol.cpp | 26 +-
13 files changed, 1617 insertions(+), 42 deletions(-)
diffs (truncated from 2024 to 300 lines):
diff -r 3187844f4a7f -r 5ebd5d7c0a76 doc/reST/presets.rst
--- a/doc/reST/presets.rst Mon Mar 09 14:35:20 2015 +0530
+++ b/doc/reST/presets.rst Fri Mar 13 12:16:47 2015 -0500
@@ -24,11 +24,13 @@ The presets adjust encoder parameters to
+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| bframes | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
+| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
++--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
| b-adapt | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rc-lookahead | 10 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
+| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
@@ -36,7 +38,7 @@ The presets adjust encoder parameters to
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| merange | 25 | 44 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
+| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
diff -r 3187844f4a7f -r 5ebd5d7c0a76 source/common/param.cpp
--- a/source/common/param.cpp Mon Mar 09 14:35:20 2015 +0530
+++ b/source/common/param.cpp Fri Mar 13 12:16:47 2015 -0500
@@ -247,10 +247,11 @@ int x265_param_default_preset(x265_param
if (!strcmp(preset, "ultrafast"))
{
- param->lookaheadDepth = 10;
+ param->lookaheadDepth = 5;
param->scenecutThreshold = 0; // disable lookahead
param->maxCUSize = 32;
- param->searchRange = 25;
+ param->minCUSize = 16;
+ param->bframes = 3;
param->bFrameAdaptive = 0;
param->subpelRefine = 0;
param->searchMethod = X265_DIA_SEARCH;
@@ -269,7 +270,7 @@ int x265_param_default_preset(x265_param
{
param->lookaheadDepth = 10;
param->maxCUSize = 32;
- param->searchRange = 44;
+ param->bframes = 3;
param->bFrameAdaptive = 0;
param->subpelRefine = 1;
param->bEnableEarlySkip = 1;
diff -r 3187844f4a7f -r 5ebd5d7c0a76 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Mon Mar 09 14:35:20 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp Fri Mar 13 12:16:47 2015 -0500
@@ -880,6 +880,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = x265_intra_pred_planar4_sse2;
p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = x265_intra_pred_planar8_sse2;
p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
+ p.cu[BLOCK_32x32].intra_pred[PLANAR_IDX] = x265_intra_pred_planar32_sse2;
p.cu[BLOCK_4x4].sse_ss = x265_pixel_ssd_ss_4x4_mmx2;
ALL_LUMA_CU(sse_ss, pixel_ssd_ss, sse2);
@@ -1193,6 +1194,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = x265_intra_pred_planar4_sse2;
p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = x265_intra_pred_planar8_sse2;
p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
+ p.cu[BLOCK_32x32].intra_pred[PLANAR_IDX] = x265_intra_pred_planar32_sse2;
p.cu[BLOCK_4x4].calcresidual = x265_getResidual4_sse2;
p.cu[BLOCK_8x8].calcresidual = x265_getResidual8_sse2;
@@ -1503,6 +1505,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_8x8].intra_pred[25] = x265_intra_pred_ang8_25_avx2;
p.cu[BLOCK_8x8].intra_pred[12] = x265_intra_pred_ang8_12_avx2;
p.cu[BLOCK_8x8].intra_pred[24] = x265_intra_pred_ang8_24_avx2;
+ p.cu[BLOCK_8x8].intra_pred[11] = x265_intra_pred_ang8_11_avx2;
+ p.cu[BLOCK_16x16].intra_pred[25] = x265_intra_pred_ang16_25_avx2;
// copy_sp primitives
p.cu[BLOCK_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
@@ -1582,6 +1586,19 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_hps = x265_interp_4tap_horiz_ps_8x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_hps = x265_interp_4tap_horiz_ps_4x2_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_hps = x265_interp_4tap_horiz_ps_4x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_hps = x265_interp_4tap_horiz_ps_4x16_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_hps = x265_interp_4tap_horiz_ps_16x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].filter_hps = x265_interp_4tap_horiz_ps_16x12_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].filter_hps = x265_interp_4tap_horiz_ps_16x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].filter_hps = x265_interp_4tap_horiz_ps_16x4_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_hps = x265_interp_4tap_horiz_ps_32x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].filter_hps = x265_interp_4tap_horiz_ps_32x24_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_hps = x265_interp_4tap_horiz_ps_32x8_avx2;
+
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vpp = x265_interp_4tap_vert_pp_4x4_avx2;
@@ -1597,6 +1614,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].filter_vpp = x265_interp_4tap_vert_pp_8x6_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vpp = x265_interp_4tap_vert_pp_8x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vpp = x265_interp_4tap_vert_pp_8x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].filter_vpp = x265_interp_4tap_vert_pp_16x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].filter_vpp = x265_interp_4tap_vert_pp_16x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].filter_vpp = x265_interp_4tap_vert_pp_16x12_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vpp = x265_interp_4tap_vert_pp_16x16_avx2;
@@ -1619,6 +1637,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vps = x265_interp_4tap_vert_ps_8x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_vps = x265_interp_4tap_vert_ps_8x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_vps = x265_interp_4tap_vert_ps_8x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].filter_vps = x265_interp_4tap_vert_ps_16x4_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].filter_vps = x265_interp_4tap_vert_ps_16x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].filter_vps = x265_interp_4tap_vert_ps_16x12_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_avx2;
@@ -1629,6 +1648,22 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].filter_vps = x265_interp_4tap_vert_ps_32x24_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vps = x265_interp_4tap_vert_ps_32x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_vps = x265_interp_4tap_vert_ps_32x8_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vsp = x265_interp_4tap_vert_sp_16x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vsp = x265_interp_4tap_vert_sp_32x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_vsp = x265_interp_4tap_vert_sp_16x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_vsp = x265_interp_4tap_vert_sp_24x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vsp = x265_interp_4tap_vert_sp_32x16_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vss = x265_interp_4tap_vert_ss_16x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vss = x265_interp_4tap_vert_ss_32x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_vss = x265_interp_4tap_vert_ss_16x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_vss = x265_interp_4tap_vert_ss_24x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vss = x265_interp_4tap_vert_ss_32x16_avx2;
}
#endif
}
diff -r 3187844f4a7f -r 5ebd5d7c0a76 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Mon Mar 09 14:35:20 2015 +0530
+++ b/source/common/x86/intrapred.h Fri Mar 13 12:16:47 2015 -0500
@@ -38,6 +38,7 @@ void x265_intra_pred_dc32_sse4(pixel* ds
void x265_intra_pred_planar4_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
void x265_intra_pred_planar8_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
void x265_intra_pred_planar16_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
+void x265_intra_pred_planar32_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
void x265_intra_pred_planar4_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
void x265_intra_pred_planar8_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
void x265_intra_pred_planar16_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
@@ -181,6 +182,8 @@ void x265_intra_pred_ang8_27_avx2(pixel*
void x265_intra_pred_ang8_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_12_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_24_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang8_11_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang16_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_all_angs_pred_4x4_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_8x8_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_16x16_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
diff -r 3187844f4a7f -r 5ebd5d7c0a76 source/common/x86/intrapred16.asm
--- a/source/common/x86/intrapred16.asm Mon Mar 09 14:35:20 2015 +0530
+++ b/source/common/x86/intrapred16.asm Fri Mar 13 12:16:47 2015 -0500
@@ -65,6 +65,10 @@ pw_planar16_0: dw 15, 14, 13, 12,
pw_planar16_1: dw 15, 15, 15, 15, 15, 15, 15, 15
pd_planar32_1: dd 31, 31, 31, 31
+pw_planar32_1: dw 31, 31, 31, 31, 31, 31, 31, 31
+pw_planar32_L: dw 31, 30, 29, 28, 27, 26, 25, 24
+pw_planar32_H: dw 23, 22, 21, 20, 19, 18, 17, 16
+
const planar32_table
%assign x 31
%rep 8
@@ -86,12 +90,15 @@ cextern pw_2
cextern pw_4
cextern pw_8
cextern pw_16
+cextern pw_32
cextern pw_1023
cextern pd_16
cextern pd_32
cextern pw_4096
cextern multiL
cextern multiH
+cextern multiH2
+cextern multiH3
cextern multi_2Row
cextern pw_swap
cextern pb_unpackwq1
@@ -575,6 +582,114 @@ cglobal intra_pred_planar16, 3,3,8
INTRA_PRED_PLANAR_16 15
RET
+;---------------------------------------------------------------------------------------
+; void intra_pred_planar(pixel* dst, intptr_t dstStride, pixel*srcPix, int, int filter)
+;---------------------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal intra_pred_planar32, 3,3,16
+ movd m3, [r2 + 66] ; topRight = above[32]
+
+ pshuflw m3, m3, 0x00
+ pshufd m3, m3, 0x44
+
+ pmullw m0, m3, [multiL] ; (x + 1) * topRight
+ pmullw m1, m3, [multiH] ; (x + 1) * topRight
+ pmullw m2, m3, [multiH2] ; (x + 1) * topRight
+ pmullw m3, [multiH3] ; (x + 1) * topRight
+
+ movd m6, [r2 + 194] ; bottomLeft = left[32]
+ pshuflw m6, m6, 0x00
+ pshufd m6, m6, 0x44
+ mova m5, m6
+ paddw m5, [pw_32]
+
+ paddw m0, m5
+ paddw m1, m5
+ paddw m2, m5
+ paddw m3, m5
+ mova m8, m6
+ mova m9, m6
+ mova m10, m6
+
+ mova m12, [pw_planar32_1]
+ movu m4, [r2 + 2]
+ psubw m8, m4
+ pmullw m4, m12
+ paddw m0, m4
+
+ movu m5, [r2 + 18]
+ psubw m9, m5
+ pmullw m5, m12
+ paddw m1, m5
+
+ movu m4, [r2 + 34]
+ psubw m10, m4
+ pmullw m4, m12
+ paddw m2, m4
+
+ movu m5, [r2 + 50]
+ psubw m6, m5
+ pmullw m5, m12
+ paddw m3, m5
+
+ mova m12, [pw_planar32_L]
+ mova m13, [pw_planar32_H]
+ mova m14, [pw_planar16_0]
+ mova m15, [pw_planar8_0]
+ add r1, r1
+
+%macro PROCESS 1
+ pmullw m5, %1, m12
+ pmullw m11, %1, m13
+ paddw m5, m0
+ paddw m11, m1
+ psrlw m5, 6
+ psrlw m11, 6
+ movu [r0], m5
+ movu [r0 + 16], m11
+
+ pmullw m5, %1, m14
+ pmullw %1, m15
+ paddw m5, m2
+ paddw %1, m3
+ psrlw m5, 6
+ psrlw %1, 6
+ movu [r0 + 32], m5
+ movu [r0 + 48], %1
+%endmacro
+
+%macro INCREMENT 0
+ paddw m2, m10
+ paddw m3, m6
+ paddw m0, m8
+ paddw m1, m9
+ add r0, r1
+%endmacro
+
+ add r2, 130 ;130 = 32*sizeof(pixel)*2 + 1*sizeof(pixel)
+%assign x 0
+%rep 4
+ movu m4, [r2]
+ add r2, 16
+%assign y 0
+%rep 8
+ %if y < 4
+ pshuflw m7, m4, 0x55 * y
+ pshufd m7, m7, 0x44
+ %else
+ pshufhw m7, m4, 0x55 * (y - 4)
+ pshufd m7, m7, 0xEE
+ %endif
+ PROCESS m7
+ %if x + y < 10
+ INCREMENT
+ %endif
+%assign y y+1
+%endrep
+%assign x x+1
More information about the x265-commits
mailing list