[x265-commits] [x265] asm: fix eoln in comment
Steve Borho
steve at borho.org
Fri Apr 10 18:35:56 CEST 2015
details: http://hg.videolan.org/x265/rev/ee76a15fa312
branches:
changeset: 10146:ee76a15fa312
user: Steve Borho <steve at borho.org>
date: Fri Apr 10 10:24:55 2015 -0500
description:
asm: fix eoln in comment
Subject: [x265] cli: annex_b format switch
details: http://hg.videolan.org/x265/rev/9f6a053a2868
branches:
changeset: 10147:9f6a053a2868
user: Xinyue Lu <i at 7086.in>
date: Thu Apr 09 18:06:44 2015 -0700
description:
cli: annex_b format switch
When bAnnexB set to true, the NAL serializer will place start codes (0x00 00 00 01) before NAL.
When false, it will place 4 bytes length before NAL.
Container formats may prefer the latter.
Also move output->setParam up so that it can select format before we initialize the encoder.
Subject: [x265] asm: avx2 code for planecopy_sp
details: http://hg.videolan.org/x265/rev/58386976e7b6
branches:
changeset: 10148:58386976e7b6
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Apr 10 10:56:41 2015 +0530
description:
asm: avx2 code for planecopy_sp
AVX2:
planecopy_sp 22.19x 5337.07 118407.46
SSE2:
planecopy_sp 14.83x 8106.54 120242.02
Subject: [x265] asm: avx2 8bpp code for convert_p2s[24xN]
details: http://hg.videolan.org/x265/rev/9c46289a0957
branches:
changeset: 10149:9c46289a0957
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 10 10:49:34 2015 +0530
description:
asm: avx2 8bpp code for convert_p2s[24xN]
convert_p2s[24x32](16.21x)
Subject: [x265] asm: avx2 8bpp code for chroma_p2s[32xN],[24xN], reuse the luma code
details: http://hg.videolan.org/x265/rev/b7dd8105b91c
branches:
changeset: 10150:b7dd8105b91c
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 10 11:10:47 2015 +0530
description:
asm: avx2 8bpp code for chroma_p2s[32xN],[24xN], reuse the luma code
Subject: [x265] change costUncoded[] coordinate system from Raster to Zigzag
details: http://hg.videolan.org/x265/rev/94d7485893a3
branches:
changeset: 10151:94d7485893a3
user: Min Chen <chenm003 at 163.com>
date: Fri Apr 10 20:06:23 2015 +0800
description:
change costUncoded[] coordinate system from Raster to Zigzag
Subject: [x265] avoid calculate rateIncUp and rateIncDown when sigHide disabled
details: http://hg.videolan.org/x265/rev/010a73622b59
branches:
changeset: 10152:010a73622b59
user: Min Chen <chenm003 at 163.com>
date: Fri Apr 10 20:49:25 2015 +0800
description:
avoid calculate rateIncUp and rateIncDown when sigHide disabled
Subject: [x265] asm: intra_pred_ang8_20 improved by ~4% over SSE4
details: http://hg.videolan.org/x265/rev/270da1018d2e
branches:
changeset: 10153:270da1018d2e
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 10 12:01:19 2015 +0530
description:
asm: intra_pred_ang8_20 improved by ~4% over SSE4
AVX2:
intra_ang_8x8[20] 7.98x 256.94 2050.52
SSE4:
intra_ang_8x8[20] 7.59x 267.77 2031.49
Subject: [x265] asm: intra_pred_ang8_16 improved by ~3% over SSE4
details: http://hg.videolan.org/x265/rev/17b694085f6a
branches:
changeset: 10154:17b694085f6a
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 10 12:38:48 2015 +0530
description:
asm: intra_pred_ang8_16 improved by ~3% over SSE4
AVX2:
intra_ang_8x8[16] 9.22x 360.04 3320.64
SSE4:
intra_ang_8x8[16] 8.68x 371.05 3222.21
Subject: [x265] asm: avx code for chroma sa8d, reused luma code
details: http://hg.videolan.org/x265/rev/58bcc43a1333
branches:
changeset: 10155:58bcc43a1333
user: Sumalatha Polureddy
date: Fri Apr 10 13:42:12 2015 +0530
description:
asm: avx code for chroma sa8d, reused luma code
Subject: [x265] asm: saoCuOrgB0 avx2 code: 23780c->18441c
details: http://hg.videolan.org/x265/rev/a6c7cf774564
branches:
changeset: 10156:a6c7cf774564
user: Divya Manivannan <divya at multicorewareinc.com>
date: Fri Apr 10 18:35:23 2015 +0530
description:
asm: saoCuOrgB0 avx2 code: 23780c->18441c
diffstat:
doc/reST/cli.rst | 9 ++
source/CMakeLists.txt | 2 +-
source/common/param.cpp | 2 +
source/common/quant.cpp | 16 +-
source/common/x86/asm-primitives.cpp | 21 +++++
source/common/x86/intrapred.h | 2 +
source/common/x86/intrapred8.asm | 125 +++++++++++++++++++++++++++++++
source/common/x86/intrapred8_allangs.asm | 2 +-
source/common/x86/ipfilter8.asm | 69 +++++++++++++++++
source/common/x86/ipfilter8.h | 17 ++++
source/common/x86/loopfilter.asm | 80 +++++++++++++++++++-
source/common/x86/loopfilter.h | 1 +
source/common/x86/pixel-a.asm | 111 +++++++++++++++++++++++++++
source/common/x86/pixel.h | 1 +
source/encoder/encoder.cpp | 5 +
source/encoder/nal.cpp | 18 ++++-
source/encoder/nal.h | 1 +
source/output/output.h | 6 +-
source/output/raw.cpp | 7 +-
source/output/raw.h | 8 +-
source/test/pixelharness.cpp | 22 ++++-
source/x265.cpp | 6 +-
source/x265.h | 5 +
23 files changed, 506 insertions(+), 30 deletions(-)
diffs (truncated from 966 to 300 lines):
diff -r 984e254f93f7 -r a6c7cf774564 doc/reST/cli.rst
--- a/doc/reST/cli.rst Thu Apr 09 11:48:08 2015 -0500
+++ b/doc/reST/cli.rst Fri Apr 10 18:35:23 2015 +0530
@@ -1481,6 +1481,15 @@ VUI fields must be manually specified.
Bitstream options
=================
+.. option:: --annexb, --no-annexb
+
+ If enabled, x265 will produce Annex B bitstream format, which places
+ start codes before NAL. If disabled, x265 will produce file format,
+ which places length before NAL. x265 CLI will choose the right option
+ based on output format. Default enabled
+
+ **API ONLY**
+
.. option:: --repeat-headers, --no-repeat-headers
If enabled, x265 will emit VPS, SPS, and PPS headers with every
diff -r 984e254f93f7 -r a6c7cf774564 source/CMakeLists.txt
--- a/source/CMakeLists.txt Thu Apr 09 11:48:08 2015 -0500
+++ b/source/CMakeLists.txt Fri Apr 10 18:35:23 2015 +0530
@@ -30,7 +30,7 @@ option(STATIC_LINK_CRT "Statically link
mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
# X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 54)
+set(X265_BUILD 55)
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r 984e254f93f7 -r a6c7cf774564 source/common/param.cpp
--- a/source/common/param.cpp Thu Apr 09 11:48:08 2015 -0500
+++ b/source/common/param.cpp Fri Apr 10 18:35:23 2015 +0530
@@ -117,6 +117,7 @@ void x265_param_default(x265_param* para
param->levelIdc = 0;
param->bHighTier = 0;
param->interlaceMode = 0;
+ param->bAnnexB = 1;
param->bRepeatHeaders = 0;
param->bEnableAccessUnitDelimiters = 0;
param->bEmitHRDSEI = 0;
@@ -580,6 +581,7 @@ int x265_param_parse(x265_param* p, cons
}
}
OPT("cu-stats") p->bLogCuStats = atobool(value);
+ OPT("annexb") p->bAnnexB = atobool(value);
OPT("repeat-headers") p->bRepeatHeaders = atobool(value);
OPT("wpp") p->bEnableWavefront = atobool(value);
OPT("ctu") p->maxCUSize = (uint32_t)atoi(value);
diff -r 984e254f93f7 -r a6c7cf774564 source/common/quant.cpp
--- a/source/common/quant.cpp Thu Apr 09 11:48:08 2015 -0500
+++ b/source/common/quant.cpp Fri Apr 10 18:35:23 2015 +0530
@@ -613,13 +613,13 @@ uint32_t Quant::rdoQuant(const CUData& c
* FIX15 nature of the CABAC cost tables minus the forward transform scale */
/* cost of not coding this coefficient (all distortion, no signal bits) */
- costUncoded[scanPos] = ((int64_t)signCoef * signCoef) << scaleBits;
+ costUncoded[blkPos] = ((int64_t)signCoef * signCoef) << scaleBits;
X265_CHECK((!!scanPos ^ !!blkPos) == 0, "failed on (blkPos=0 && scanPos!=0)\n");
if (usePsyMask & scanPos)
/* when no residual coefficient is coded, predicted coef == recon coef */
- costUncoded[scanPos] -= PSYVALUE(predictedCoef);
+ costUncoded[blkPos] -= PSYVALUE(predictedCoef);
- totalUncodedCost += costUncoded[scanPos];
+ totalUncodedCost += costUncoded[blkPos];
if (maxAbsLevel && lastScanPos < 0)
{
@@ -638,7 +638,7 @@ uint32_t Quant::rdoQuant(const CUData& c
/* No non-zero coefficient yet found, but this does not mean
* there is no uncoded-cost for this coefficient. Pre-
* quantization the coefficient may have been non-zero */
- totalRdCost += costUncoded[scanPos];
+ totalRdCost += costUncoded[blkPos];
}
else
{
@@ -668,7 +668,7 @@ uint32_t Quant::rdoQuant(const CUData& c
{
/* set default costs to uncoded costs */
costSig[scanPos] = SIGCOST(estBitsSbac.significantBits[ctxSig][0]);
- costCoeff[scanPos] = costUncoded[scanPos] + costSig[scanPos];
+ costCoeff[scanPos] = costUncoded[blkPos] + costSig[scanPos];
}
sigRateDelta[blkPos] = estBitsSbac.significantBits[ctxSig][1] - estBitsSbac.significantBits[ctxSig][0];
sigCoefBits = estBitsSbac.significantBits[ctxSig][1];
@@ -739,7 +739,7 @@ uint32_t Quant::rdoQuant(const CUData& c
totalRdCost += costCoeff[scanPos];
/* record costs for sign-hiding performed at the end */
- if (level)
+ if ((cu.m_slice->m_pps->bSignHideEnabled ? ~0 : 0) & level)
{
const int32_t diff0 = level - 1 - baseLevel;
const int32_t diff2 = level + 1 - baseLevel;
@@ -810,7 +810,7 @@ uint32_t Quant::rdoQuant(const CUData& c
{
sigCoeffGroupFlag64 |= cgBlkPosMask;
cgRdStats.codedLevelAndDist += costCoeff[scanPos] - costSig[scanPos];
- cgRdStats.uncodedDist += costUncoded[scanPos];
+ cgRdStats.uncodedDist += costUncoded[blkPos];
cgRdStats.nnzBeforePos0 += scanPosinCG;
}
} /* end for (scanPosinCG) */
@@ -965,7 +965,7 @@ uint32_t Quant::rdoQuant(const CUData& c
}
totalRdCost -= costCoeff[scanPos];
- totalRdCost += costUncoded[scanPos];
+ totalRdCost += costUncoded[blkPos];
}
else
totalRdCost -= costSig[scanPos];
diff -r 984e254f93f7 -r a6c7cf774564 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Thu Apr 09 11:48:08 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Fri Apr 10 18:35:23 2015 +0530
@@ -1488,6 +1488,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].satd = x265_pixel_satd_32x8_avx;
p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].satd = x265_pixel_satd_8x32_avx;
ASSIGN_SA8D(avx);
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].sa8d = x265_pixel_sa8d_32x32_avx;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].sa8d = x265_pixel_sa8d_16x16_avx;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_8x8].sa8d = x265_pixel_sa8d_8x8_avx;
+ p.chroma[X265_CSP_I420].cu[BLOCK_420_4x4].sa8d = x265_pixel_satd_4x4_avx;
ASSIGN_SSE_PP(avx);
p.chroma[X265_CSP_I420].cu[BLOCK_420_8x8].sse_pp = x265_pixel_ssd_8x8_avx;
ASSIGN_SSE_SS(avx);
@@ -1552,6 +1556,8 @@ void setupAssemblyPrimitives(EncoderPrim
#if X86_64
if (cpuMask & X265_CPU_AVX2)
{
+ p.planecopy_sp = x265_downShift_16_avx2;
+
p.cu[BLOCK_32x32].intra_pred[DC_IDX] = x265_intra_pred_dc32_avx2;
p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_avx2;
@@ -1563,6 +1569,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.saoCuOrgE0 = x265_saoCuOrgE0_avx2;
p.saoCuOrgE1 = x265_saoCuOrgE1_avx2;
p.saoCuOrgE1_2Rows = x265_saoCuOrgE1_2Rows_avx2;
+ p.saoCuOrgB0 = x265_saoCuOrgB0_avx2;
p.cu[BLOCK_4x4].psy_cost_ss = x265_psyCost_ss_4x4_avx2;
p.cu[BLOCK_8x8].psy_cost_ss = x265_psyCost_ss_8x8_avx2;
@@ -1769,11 +1776,13 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_8x8].intra_pred[24] = x265_intra_pred_ang8_24_avx2;
p.cu[BLOCK_8x8].intra_pred[11] = x265_intra_pred_ang8_11_avx2;
p.cu[BLOCK_8x8].intra_pred[13] = x265_intra_pred_ang8_13_avx2;
+ p.cu[BLOCK_8x8].intra_pred[20] = x265_intra_pred_ang8_20_avx2;
p.cu[BLOCK_8x8].intra_pred[21] = x265_intra_pred_ang8_21_avx2;
p.cu[BLOCK_8x8].intra_pred[22] = x265_intra_pred_ang8_22_avx2;
p.cu[BLOCK_8x8].intra_pred[23] = x265_intra_pred_ang8_23_avx2;
p.cu[BLOCK_8x8].intra_pred[14] = x265_intra_pred_ang8_14_avx2;
p.cu[BLOCK_8x8].intra_pred[15] = x265_intra_pred_ang8_15_avx2;
+ p.cu[BLOCK_8x8].intra_pred[16] = x265_intra_pred_ang8_16_avx2;
p.cu[BLOCK_16x16].intra_pred[3] = x265_intra_pred_ang16_3_avx2;
p.cu[BLOCK_16x16].intra_pred[4] = x265_intra_pred_ang16_4_avx2;
p.cu[BLOCK_16x16].intra_pred[5] = x265_intra_pred_ang16_5_avx2;
@@ -2070,6 +2079,18 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_avx2;
p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_avx2;
p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_avx2;
+ p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_avx2;
+
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].p2s = x265_filterPixelToShort_24x32_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_avx2;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_avx2;
if ((cpuMask & X265_CPU_BMI1) && (cpuMask & X265_CPU_BMI2))
p.findPosLast = x265_findPosLast_x64;
diff -r 984e254f93f7 -r a6c7cf774564 source/common/x86/intrapred.h
--- a/source/common/x86/intrapred.h Thu Apr 09 11:48:08 2015 -0500
+++ b/source/common/x86/intrapred.h Fri Apr 10 18:35:23 2015 +0530
@@ -236,6 +236,8 @@ void x265_intra_pred_ang8_11_avx2(pixel*
void x265_intra_pred_ang8_13_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_14_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_15_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang8_16_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
+void x265_intra_pred_ang8_20_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_21_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_22_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
void x265_intra_pred_ang8_23_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
diff -r 984e254f93f7 -r a6c7cf774564 source/common/x86/intrapred8.asm
--- a/source/common/x86/intrapred8.asm Thu Apr 09 11:48:08 2015 -0500
+++ b/source/common/x86/intrapred8.asm Fri Apr 10 18:35:23 2015 +0530
@@ -690,6 +690,12 @@ c_ang8_mode_15: db 17, 15, 17, 15,
db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
+ALIGN 32
+c_ang8_mode_20: db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
+ db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
+ db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
+
const ang_table
%assign x 0
%rep 32
@@ -11946,6 +11952,125 @@ cglobal intra_pred_ang8_15, 3, 6, 6
RET
INIT_YMM avx2
+cglobal intra_pred_ang8_16, 3, 6, 6
+ mova m3, [pw_1024]
+ movu xm5, [r2 + 16]
+ pinsrb xm5, [r2], 0
+ lea r5, [intra_pred_shuff_0_8]
+ mova xm0, xm5
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 2], 0
+ vinserti128 m0, m0, xm5, 1
+ pshufb m0, [r5]
+
+ lea r4, [c_ang8_mode_20]
+ pmaddubsw m1, m0, [r4]
+ pmulhrsw m1, m3
+ mova xm0, xm5
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 3], 0
+ vinserti128 m0, m0, xm5, 1
+ pshufb m0, [r5]
+ pmaddubsw m2, m0, [r4 + mmsize]
+ pmulhrsw m2, m3
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 5], 0
+ vinserti128 m0, m5, xm5, 1
+ pshufb m0, [r5]
+ pmaddubsw m4, m0, [r4 + 2 * mmsize]
+ pmulhrsw m4, m3
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 6], 0
+ mova xm0, xm5
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 8], 0
+ vinserti128 m0, m0, xm5, 1
+ pshufb m0, [r5]
+ pmaddubsw m0, [r4 + 3 * mmsize]
+ pmulhrsw m0, m3
+
+ packuswb m1, m2
+ packuswb m4, m0
+
+ vperm2i128 m2, m1, m4, 00100000b
+ vperm2i128 m1, m1, m4, 00110001b
+ punpcklbw m4, m2, m1
+ punpckhbw m2, m1
+ punpcklwd m1, m4, m2
+ punpckhwd m4, m2
+ mova m0, [trans8_shuf]
+ vpermd m1, m0, m1
+ vpermd m4, m0, m4
+
+ lea r3, [3 * r1]
+ movq [r0], xm1
+ movhps [r0 + r1], xm1
+ vextracti128 xm2, m1, 1
+ movq [r0 + 2 * r1], xm2
+ movhps [r0 + r3], xm2
+ lea r0, [r0 + 4 * r1]
+ movq [r0], xm4
+ movhps [r0 + r1], xm4
+ vextracti128 xm2, m4, 1
+ movq [r0 + 2 * r1], xm2
+ movhps [r0 + r3], xm2
+ RET
+
+INIT_YMM avx2
+cglobal intra_pred_ang8_20, 3, 6, 6
+ mova m3, [pw_1024]
+ movu xm5, [r2]
+ lea r5, [intra_pred_shuff_0_8]
+ mova xm0, xm5
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 2 + 16], 0
+ vinserti128 m0, m0, xm5, 1
+ pshufb m0, [r5]
+
+ lea r4, [c_ang8_mode_20]
+ pmaddubsw m1, m0, [r4]
+ pmulhrsw m1, m3
+ mova xm0, xm5
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 3 + 16], 0
+ vinserti128 m0, m0, xm5, 1
+ pshufb m0, [r5]
+ pmaddubsw m2, m0, [r4 + mmsize]
+ pmulhrsw m2, m3
+ pslldq xm5, 1
+ pinsrb xm5, [r2 + 5 + 16], 0
+ vinserti128 m0, m5, xm5, 1
+ pshufb m0, [r5]
More information about the x265-commits
mailing list