[x265-commits] [x265] merge: set zero MV correctly
Deepthi Nandakumar
deepthi at multicorewareinc.com
Mon Mar 16 17:11:38 CET 2015
details: http://hg.videolan.org/x265/rev/1adf818d7f7c
branches:
changeset: 9738:1adf818d7f7c
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Mar 16 14:24:55 2015 +0530
description:
merge: set zero MV correctly
Subject: [x265] merge: check merge reference indices
details: http://hg.videolan.org/x265/rev/5c845d111d92
branches:
changeset: 9739:5c845d111d92
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Mar 16 14:29:14 2015 +0530
description:
merge: check merge reference indices
Subject: [x265] predict: check for out of bounds ref indices before weight tables are used
details: http://hg.videolan.org/x265/rev/9791a3bb74cf
branches:
changeset: 9740:9791a3bb74cf
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Mar 16 16:03:40 2015 +0530
description:
predict: check for out of bounds ref indices before weight tables are used
Subject: [x265] asm: filter_vsp[4x8], filter_vss[4x8] in avx2: 673c->339c, 608c->263c
details: http://hg.videolan.org/x265/rev/8caa5bdcaf25
branches:
changeset: 9741:8caa5bdcaf25
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Mar 16 10:10:31 2015 +0530
description:
asm: filter_vsp[4x8], filter_vss[4x8] in avx2: 673c->339c, 608c->263c
Subject: [x265] asm: filter_vsp[4x16], filter_vss[4x16] in avx2: 985c->599c, 877c->491c
details: http://hg.videolan.org/x265/rev/d69d56a2fdb3
branches:
changeset: 9742:d69d56a2fdb3
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Mar 16 10:14:19 2015 +0530
description:
asm: filter_vsp[4x16], filter_vss[4x16] in avx2: 985c->599c, 877c->491c
Subject: [x265] asm: filter_vsp[4x2], filter_vss[4x2] in avx2: 237c->137c, 206c->118c
details: http://hg.videolan.org/x265/rev/961c52b43700
branches:
changeset: 9743:961c52b43700
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Mar 16 10:17:10 2015 +0530
description:
asm: filter_vsp[4x2], filter_vss[4x2] in avx2: 237c->137c, 206c->118c
Subject: [x265] asm: filter_vsp[2x4], filter_vss[2x4] in avx2: 292c->189c, 248c->184c
details: http://hg.videolan.org/x265/rev/f5747f4a22e4
branches:
changeset: 9744:f5747f4a22e4
user: Divya Manivannan <divya at multicorewareinc.com>
date: Mon Mar 16 10:21:43 2015 +0530
description:
asm: filter_vsp[2x4], filter_vss[2x4] in avx2: 292c->189c, 248c->184c
Subject: [x265] asm-intra_pred_ang16_28: improved, 865.16c -> 456.44c
details: http://hg.videolan.org/x265/rev/3c8f7b661c16
branches:
changeset: 9745:3c8f7b661c16
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 14:02:25 2015 +0530
description:
asm-intra_pred_ang16_28: improved, 865.16c -> 456.44c
AVX2:
intra_ang_16x16[28] 19.82x 456.44 9047.79
SSE4:
intra_ang_16x16[28] 10.36x 865.16 8962.21
Subject: [x265] asm-intrapred8.asm: rename macro 'INTRA_PRED_ANG16_25' to 'INTRA_PRED_ANG16_MC1'
details: http://hg.videolan.org/x265/rev/6ecedcdb3aae
branches:
changeset: 9746:6ecedcdb3aae
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 14:22:51 2015 +0530
description:
asm-intrapred8.asm: rename macro 'INTRA_PRED_ANG16_25' to 'INTRA_PRED_ANG16_MC1'
Given a generic name to reuse in other intra_pred16 mode asm codes.
Subject: [x265] asm-intrapred8.asm: use macro 'INTRA_PRED_ANG16_MC1' to shorten asm code length
details: http://hg.videolan.org/x265/rev/34626be62712
branches:
changeset: 9747:34626be62712
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 14:28:43 2015 +0530
description:
asm-intrapred8.asm: use macro 'INTRA_PRED_ANG16_MC1' to shorten asm code length
Subject: [x265] asm-intra_pred_ang16_27: improved, 645.84c -> 415.14c over SSE4 asm code
details: http://hg.videolan.org/x265/rev/b366877f0a59
branches:
changeset: 9748:b366877f0a59
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 18:17:38 2015 +0530
description:
asm-intra_pred_ang16_27: improved, 645.84c -> 415.14c over SSE4 asm code
AVX2:
intra_ang_16x16[27] 21.17x 415.14 8789.72
SSE4:
intra_ang_16x16[27] 13.68x 645.84 8833.49
Subject: [x265] asm-intra_pred_ang16_29: improved, 866.95c -> 493.20c over SSE4 asm code
details: http://hg.videolan.org/x265/rev/3ed5e5416686
branches:
changeset: 9749:3ed5e5416686
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 19:40:19 2015 +0530
description:
asm-intra_pred_ang16_29: improved, 866.95c -> 493.20c over SSE4 asm code
AVX2:
intra_ang_16x16[29] 18.72x 493.20 9231.12
SSE4:
intra_ang_16x16[29] 10.46x 866.95 9072.53
Subject: [x265] asm-intra_pred_ang16_25: reduce const table address size
details: http://hg.videolan.org/x265/rev/d1154c2989fa
branches:
changeset: 9750:d1154c2989fa
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 19:47:16 2015 +0530
description:
asm-intra_pred_ang16_25: reduce const table address size
Subject: [x265] asm-intra-pred8.asm: replace 'lea' instruction with faster 'add' instruction
details: http://hg.videolan.org/x265/rev/058b38bf8e85
branches:
changeset: 9751:058b38bf8e85
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Mar 13 19:51:56 2015 +0530
description:
asm-intra-pred8.asm: replace 'lea' instruction with faster 'add' instruction
Subject: [x265] asm : chroma_hpp[4x2] for i420 avx2 - improved 138c->134c
details: http://hg.videolan.org/x265/rev/c6bb6ec54973
branches:
changeset: 9752:c6bb6ec54973
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Mar 16 10:32:43 2015 +0530
description:
asm : chroma_hpp[4x2] for i420 avx2 - improved 138c->134c
Subject: [x265] asm: chroma_hpp[32x16, 32x24, 32x8] for i420 - improved 2966c->1647c, 4514c->2627c, 1494c->870c
details: http://hg.videolan.org/x265/rev/7a115f22ee36
branches:
changeset: 9753:7a115f22ee36
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Mar 16 10:36:29 2015 +0530
description:
asm: chroma_hpp[32x16, 32x24, 32x8] for i420 - improved 2966c->1647c, 4514c->2627c, 1494c->870c
Subject: [x265] asm: avx2 code for sad[32x32] for 8bpp
details: http://hg.videolan.org/x265/rev/720154e24059
branches:
changeset: 9754:720154e24059
user: Sumalatha Polureddy<sumalatha at multicorewareinc.com>
date: Mon Mar 16 10:43:13 2015 +0530
description:
asm: avx2 code for sad[32x32] for 8bpp
sad[32x32] 44.69x 494.86 22114.45
Subject: [x265] asm: chroma_hpp[8x4, 8x16, 8x32] for i420 avx2 - improved 289c->220c, 928c->618c, 1802c->1128c
details: http://hg.videolan.org/x265/rev/dd113f0bf713
branches:
changeset: 9755:dd113f0bf713
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Mar 16 10:43:23 2015 +0530
description:
asm: chroma_hpp[8x4, 8x16, 8x32] for i420 avx2 - improved 289c->220c, 928c->618c, 1802c->1128c
Subject: [x265] asm: chroma_hpp[4x8, 4x16] for i420 avx2 - improved 346c->322c, 610c->586c
details: http://hg.videolan.org/x265/rev/74496ce5d8ba
branches:
changeset: 9756:74496ce5d8ba
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Mon Mar 16 10:47:09 2015 +0530
description:
asm: chroma_hpp[4x8, 4x16] for i420 avx2 - improved 346c->322c, 610c->586c
diffstat:
source/common/cudata.cpp | 2 +-
source/common/predict.cpp | 11 +-
source/common/x86/asm-primitives.cpp | 25 +
source/common/x86/intrapred.h | 3 +
source/common/x86/intrapred8.asm | 195 +++++++++-
source/common/x86/ipfilter8.asm | 670 +++++++++++++++++++++++++++++++++++
source/common/x86/ipfilter8.h | 1 +
source/common/x86/sad-a.asm | 29 +
source/encoder/analysis.cpp | 1 +
9 files changed, 922 insertions(+), 15 deletions(-)
diffs (truncated from 1103 to 300 lines):
diff -r 6461985f33ac -r 74496ce5d8ba source/common/cudata.cpp
--- a/source/common/cudata.cpp Sun Mar 15 11:58:32 2015 -0500
+++ b/source/common/cudata.cpp Mon Mar 16 10:47:09 2015 +0530
@@ -1608,7 +1608,7 @@ uint32_t CUData::getInterMergeCandidates
while (count < maxNumMergeCand)
{
candDir[count] = 1;
- candMvField[count][0].mv = 0;
+ candMvField[count][0].mv.word = 0;
candMvField[count][0].refIdx = r;
if (isInterB)
diff -r 6461985f33ac -r 74496ce5d8ba source/common/predict.cpp
--- a/source/common/predict.cpp Sun Mar 15 11:58:32 2015 -0500
+++ b/source/common/predict.cpp Mon Mar 16 10:47:09 2015 +0530
@@ -130,6 +130,9 @@ void Predict::motionCompensation(const C
WeightValues wv0[3], wv1[3];
const WeightParam *pwp0, *pwp1;
+ X265_CHECK(refIdx0 < cu.m_slice->m_numRefIdx[0], "bidir refidx0 out of range\n");
+ X265_CHECK(refIdx1 < cu.m_slice->m_numRefIdx[1], "bidir refidx1 out of range\n");
+
if (cu.m_slice->m_pps->bUseWeightedBiPred)
{
pwp0 = refIdx0 >= 0 ? cu.m_slice->m_weightPredTable[0][refIdx0] : NULL;
@@ -174,10 +177,6 @@ void Predict::motionCompensation(const C
cu.clipMv(mv0);
cu.clipMv(mv1);
- /* Biprediction */
- X265_CHECK(refIdx0 < cu.m_slice->m_numRefIdx[0], "bidir refidx0 out of range\n");
- X265_CHECK(refIdx1 < cu.m_slice->m_numRefIdx[1], "bidir refidx1 out of range\n");
-
if (bLuma)
{
predInterLumaShort(pu, m_predShortYuv[0], *cu.m_slice->m_refPicList[0][refIdx0]->m_reconPic, mv0);
@@ -199,9 +198,6 @@ void Predict::motionCompensation(const C
MV mv0 = cu.m_mv[0][pu.puAbsPartIdx];
cu.clipMv(mv0);
- /* uniprediction to L0 */
- X265_CHECK(refIdx0 < cu.m_slice->m_numRefIdx[0], "unidir refidx0 out of range\n");
-
if (pwp0 && pwp0->bPresentFlag)
{
ShortYuv& shortYuv = m_predShortYuv[0];
@@ -228,7 +224,6 @@ void Predict::motionCompensation(const C
/* uniprediction to L1 */
X265_CHECK(refIdx1 >= 0, "refidx1 was not positive\n");
- X265_CHECK(refIdx1 < cu.m_slice->m_numRefIdx[1], "unidir refidx1 out of range\n");
if (pwp1 && pwp1->bPresentFlag)
{
diff -r 6461985f33ac -r 74496ce5d8ba source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Sun Mar 15 11:58:32 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Mon Mar 16 10:47:09 2015 +0530
@@ -1444,6 +1444,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_8x16].satd = x265_pixel_satd_8x16_avx2;
p.pu[LUMA_8x8].satd = x265_pixel_satd_8x8_avx2;
+ p.pu[LUMA_32x32].sad = x265_pixel_sad_32x32_avx2;
+
p.pu[LUMA_8x4].sad_x3 = x265_pixel_sad_x3_8x4_avx2;
p.pu[LUMA_8x8].sad_x3 = x265_pixel_sad_x3_8x8_avx2;
p.pu[LUMA_8x16].sad_x3 = x265_pixel_sad_x3_8x16_avx2;
@@ -1507,6 +1509,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_8x8].intra_pred[24] = x265_intra_pred_ang8_24_avx2;
p.cu[BLOCK_8x8].intra_pred[11] = x265_intra_pred_ang8_11_avx2;
p.cu[BLOCK_16x16].intra_pred[25] = x265_intra_pred_ang16_25_avx2;
+ p.cu[BLOCK_16x16].intra_pred[28] = x265_intra_pred_ang16_28_avx2;
+ p.cu[BLOCK_16x16].intra_pred[27] = x265_intra_pred_ang16_27_avx2;
+ p.cu[BLOCK_16x16].intra_pred[29] = x265_intra_pred_ang16_29_avx2;
// copy_sp primitives
p.cu[BLOCK_16x16].copy_sp = x265_blockcopy_sp_16x16_avx2;
@@ -1581,6 +1586,18 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_hpp = x265_interp_4tap_horiz_pp_32x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_hpp = x265_interp_4tap_horiz_pp_16x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_hpp = x265_interp_4tap_horiz_pp_4x2_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_hpp = x265_interp_4tap_horiz_pp_4x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_hpp = x265_interp_4tap_horiz_pp_4x16_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_hpp = x265_interp_4tap_horiz_pp_32x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].filter_hpp = x265_interp_4tap_horiz_pp_32x24_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].filter_hpp = x265_interp_4tap_horiz_pp_32x8_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].filter_hpp = x265_interp_4tap_horiz_pp_8x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].filter_hpp = x265_interp_4tap_horiz_pp_8x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].filter_hpp = x265_interp_4tap_horiz_pp_8x32_avx2;
+
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_hps = x265_interp_4tap_horiz_ps_32x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_hps = x265_interp_4tap_horiz_ps_16x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].filter_hps = x265_interp_4tap_horiz_ps_4x4_avx2;
@@ -1653,6 +1670,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vsp = x265_interp_4tap_vert_sp_8x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vsp = x265_interp_4tap_vert_sp_16x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vsp = x265_interp_4tap_vert_sp_32x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_vsp = x265_interp_4tap_vert_sp_2x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vsp = x265_interp_4tap_vert_sp_4x2_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vsp = x265_interp_4tap_vert_sp_4x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vsp = x265_interp_4tap_vert_sp_4x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_vsp = x265_interp_4tap_vert_sp_16x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_vsp = x265_interp_4tap_vert_sp_24x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vsp = x265_interp_4tap_vert_sp_32x16_avx2;
@@ -1661,6 +1682,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].filter_vss = x265_interp_4tap_vert_ss_8x8_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].filter_vss = x265_interp_4tap_vert_ss_16x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].filter_vss = x265_interp_4tap_vert_ss_32x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].filter_vss = x265_interp_4tap_vert_ss_2x4_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].filter_vss = x265_interp_4tap_vert_ss_4x2_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].filter_vss = x265_interp_4tap_vert_ss_4x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].filter_vss = x265_interp_4tap_vert_ss_4x16_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].filter_vss = x265_interp_4tap_vert_ss_16x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].filter_vss = x265_interp_4tap_vert_ss_24x32_avx2;
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].filter_vss = x265_interp_4tap_vert_ss_32x16_avx2;
diff -r 6461985f33ac -r 74496ce5d8ba source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Sun Mar 15 11:58:32 2015 -0500
+++ b/source/common/x86/intrapred.h Mon Mar 16 10:47:09 2015 +0530
@@ -184,6 +184,9 @@ void x265_intra_pred_ang8_12_avx2(pixel*
void x265_intra_pred_ang8_24_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_11_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang16_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang16_28_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang16_27_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang16_29_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_all_angs_pred_4x4_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_8x8_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
void x265_all_angs_pred_16x16_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
diff -r 6461985f33ac -r 74496ce5d8ba source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm Sun Mar 15 11:58:32 2015 -0500
+++ b/source/common/x86/intrapred8.asm Mon Mar 16 10:47:09 2015 +0530
@@ -2,6 +2,7 @@
;* Copyright (C) 2013 x265 project
;*
;* Authors: Min Chen <chenm003 at 163.com> <min.chen at multicorewareinc.com>
+;* Praveen Kumar Tiwari <praveen at multicorewareinc.com>
;*
;* This program is free software; you can redistribute it and/or modify
;* it under the terms of the GNU General Public License as published by
@@ -123,6 +124,44 @@ c_ang16_mode_25: db 2, 30, 2, 30, 2
db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
+
+ALIGN 32
+c_ang16_mode_28: db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
+ db 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
+ db 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
+ db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
+ db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
+ db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+
+
+ALIGN 32
+c_ang16_mode_27: db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+ db 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
+ db 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
+
+ALIGN 32
+intra_pred_shuff_0_15: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 15
+
+
+ALIGN 32
+c_ang16_mode_29: db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
+ db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
+ db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
+ db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
+ db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
+
ALIGN 32
;; (blkSize - 1 - x)
pw_planar4_0: dw 3, 2, 1, 0, 3, 2, 1, 0
@@ -10699,7 +10738,7 @@ cglobal intra_pred_ang8_24, 3, 5, 5
movu [%2], xm4
%endmacro
-%macro INTRA_PRED_ANG16_25 1
+%macro INTRA_PRED_ANG16_MC1 1
INTRA_PRED_ANG16_MC0 r0, r0 + r1, %1
INTRA_PRED_ANG16_MC0 r0 + 2 * r1, r0 + r3, (%1 + 1)
%endmacro
@@ -10716,14 +10755,158 @@ cglobal intra_pred_ang16_25, 3, 5, 5
lea r3, [3 * r1]
lea r4, [c_ang16_mode_25]
- INTRA_PRED_ANG16_25 0
+ INTRA_PRED_ANG16_MC1 0
lea r0, [r0 + 4 * r1]
- INTRA_PRED_ANG16_25 2
+ INTRA_PRED_ANG16_MC1 2
+
+ add r4, 4 * mmsize
lea r0, [r0 + 4 * r1]
- INTRA_PRED_ANG16_25 4
+ INTRA_PRED_ANG16_MC1 0
lea r0, [r0 + 4 * r1]
- INTRA_PRED_ANG16_25 6
- RET
+ INTRA_PRED_ANG16_MC1 2
+ RET
+
+INIT_YMM avx2
+cglobal intra_pred_ang16_28, 3, 5, 6
+ mova m0, [pw_1024]
+ mova m5, [intra_pred_shuff_0_8]
+ lea r3, [3 * r1]
+ lea r4, [c_ang16_mode_28]
+
+ vbroadcasti128 m1, [r2 + 1]
+ pshufb m1, m5
+ vbroadcasti128 m2, [r2 + 9]
+ pshufb m2, m5
+
+ INTRA_PRED_ANG16_MC1 0
+
+ lea r0, [r0 + 4 * r1]
+
+ INTRA_PRED_ANG16_MC0 r0, r0 + r1, 2
+
+ vbroadcasti128 m1, [r2 + 2]
+ pshufb m1, m5
+ vbroadcasti128 m2, [r2 + 10]
+ pshufb m2, m5
+
+ INTRA_PRED_ANG16_MC0 r0 + 2 * r1, r0 + r3, 3
+
+ lea r0, [r0 + 4 * r1]
+ add r4, 4 * mmsize
+
+ INTRA_PRED_ANG16_MC1 0
+
+ vbroadcasti128 m1, [r2 + 3]
+ pshufb m1, m5
+ vbroadcasti128 m2, [r2 + 11]
+ pshufb m2, m5
+
+ lea r0, [r0 + 4 * r1]
+
+ INTRA_PRED_ANG16_MC1 2
+ RET
+
+INIT_YMM avx2
+cglobal intra_pred_ang16_27, 3, 5, 5
+ mova m0, [pw_1024]
+ lea r3, [3 * r1]
+ lea r4, [c_ang16_mode_27]
+
+ vbroadcasti128 m1, [r2 + 1]
+ pshufb m1, [intra_pred_shuff_0_8]
+ vbroadcasti128 m2, [r2 + 9]
+ pshufb m2, [intra_pred_shuff_0_8]
+
+ INTRA_PRED_ANG16_MC1 0
+
+ lea r0, [r0 + 4 * r1]
+ INTRA_PRED_ANG16_MC1 2
+
+ lea r0, [r0 + 4 * r1]
+ add r4, 4 * mmsize
+ INTRA_PRED_ANG16_MC1 0
+
+ lea r0, [r0 + 4 * r1]
+ INTRA_PRED_ANG16_MC0 r0, r0 + r1, 2
+
+ vperm2i128 m1, m1, m2, 00100000b
+ pmaddubsw m3, m1, [r4 + 3 * mmsize]
+ pmulhrsw m3, m0
+ vbroadcasti128 m2, [r2 + 2]
+ pshufb m2, [intra_pred_shuff_0_15]
+ pmaddubsw m2, [r4 + 4 * mmsize]
+ pmulhrsw m2, m0
+ packuswb m3, m2
+ vpermq m3, m3, 11011000b
+ movu [r0 + 2 * r1], xm3
+ vextracti128 xm4, m3, 1
+ movu [r0 + r3], xm4
+ RET
+
+INIT_YMM avx2
+cglobal intra_pred_ang16_29, 3, 5, 5
+ mova m0, [pw_1024]
+ mova m5, [intra_pred_shuff_0_8]
+ lea r3, [3 * r1]
More information about the x265-commits
mailing list