[x265-commits] [x265] fix bug for testbench string buffer overflow
Min Chen
chenm003 at 163.com
Fri Nov 22 17:33:37 CET 2013
details: http://hg.videolan.org/x265/rev/ab94f6effb71
branches:
changeset: 5259:ab94f6effb71
user: Min Chen <chenm003 at 163.com>
date: Fri Nov 22 15:00:04 2013 +0800
description:
fix bug for testbench string buffer overflow
Subject: [x265] split dequant to normal and scaling path
details: http://hg.videolan.org/x265/rev/4ec80bd40603
branches:
changeset: 5260:4ec80bd40603
user: Min Chen <chenm003 at 163.com>
date: Fri Nov 22 18:49:49 2013 +0800
description:
split dequant to normal and scaling path
Subject: [x265] asm: code for sse_pp_12x16 routine
details: http://hg.videolan.org/x265/rev/f09ca4290a55
branches:
changeset: 5261:f09ca4290a55
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Nov 22 15:50:33 2013 +0550
description:
asm: code for sse_pp_12x16 routine
Subject: [x265] pixel_add_ps_12x16, asm code
details: http://hg.videolan.org/x265/rev/9f34d1d82296
branches:
changeset: 5262:9f34d1d82296
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 17:35:45 2013 +0550
description:
pixel_add_ps_12x16, asm code
Subject: [x265] pixel_add_ps_48x64, asm code
details: http://hg.videolan.org/x265/rev/3847098e9553
branches:
changeset: 5263:3847098e9553
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 18:04:59 2013 +0550
description:
pixel_add_ps_48x64, asm code
Subject: [x265] pixel_add_ps_64xN, asm code
details: http://hg.videolan.org/x265/rev/e7eeb6443303
branches:
changeset: 5264:e7eeb6443303
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 18:17:02 2013 +0550
description:
pixel_add_ps_64xN, asm code
Subject: [x265] asm-primitives.cpp, removed temporary function pointer initialization, generated through macro calls
details: http://hg.videolan.org/x265/rev/76e2c787aadb
branches:
changeset: 5265:76e2c787aadb
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 19:04:26 2013 +0550
description:
asm-primitives.cpp, removed temporary function pointer initialization, generated through macro calls
Subject: [x265] asm: code for sse_pp_24x32 routine
details: http://hg.videolan.org/x265/rev/0b9bccb2ef7f
branches:
changeset: 5266:0b9bccb2ef7f
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Nov 22 19:44:32 2013 +0550
description:
asm: code for sse_pp_24x32 routine
Subject: [x265] asm: code of sse_pp routine for 48x64 and 64x16 blocks
details: http://hg.videolan.org/x265/rev/2e0a0a5eb0c7
branches:
changeset: 5267:2e0a0a5eb0c7
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Nov 22 20:09:55 2013 +0550
description:
asm: code of sse_pp routine for 48x64 and 64x16 blocks
Subject: [x265] TComYuv::addClip, integrated luma_add_ps
details: http://hg.videolan.org/x265/rev/fd90bd911169
branches:
changeset: 5268:fd90bd911169
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 20:43:13 2013 +0550
description:
TComYuv::addClip, integrated luma_add_ps
Subject: [x265] added blockcopy_sp function pointers
details: http://hg.videolan.org/x265/rev/4b437f76280d
branches:
changeset: 5269:4b437f76280d
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 20:52:51 2013 +0550
description:
added blockcopy_sp function pointers
Subject: [x265] asm: code of sse_pp routine for 64x32, 64x48 and 64x64 blocks
details: http://hg.videolan.org/x265/rev/f082c556f337
branches:
changeset: 5270:f082c556f337
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Fri Nov 22 21:04:40 2013 +0550
description:
asm: code of sse_pp routine for 64x32, 64x48 and 64x64 blocks
Subject: [x265] TComYuv::addClipChroma, integrated pixel_add_ps function
details: http://hg.videolan.org/x265/rev/cc123a1ec253
branches:
changeset: 5271:cc123a1ec253
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Nov 22 21:25:37 2013 +0550
description:
TComYuv::addClipChroma, integrated pixel_add_ps function
Subject: [x265] pixelharness: fix the other header buffer
details: http://hg.videolan.org/x265/rev/3c827bba6cd6
branches:
changeset: 5272:3c827bba6cd6
user: Steve Borho <steve at borho.org>
date: Fri Nov 22 10:02:18 2013 -0600
description:
pixelharness: fix the other header buffer
Subject: [x265] pixel: drop intrinsic sse_pp functions, we have ASM coverage
details: http://hg.videolan.org/x265/rev/1c74d7bfd007
branches:
changeset: 5273:1c74d7bfd007
user: Steve Borho <steve at borho.org>
date: Fri Nov 22 10:18:18 2013 -0600
description:
pixel: drop intrinsic sse_pp functions, we have ASM coverage
diffstat:
source/Lib/TLibCommon/TComTrQuant.cpp | 15 +-
source/Lib/TLibCommon/TComYuv.cpp | 43 +--
source/Lib/TLibCommon/TComYuv.h | 4 +-
source/common/dct.cpp | 70 ++--
source/common/primitives.h | 7 +-
source/common/vec/dct-sse41.cpp | 170 ++++++------
source/common/vec/pixel-sse41.cpp | 220 ----------------
source/common/x86/asm-primitives.cpp | 22 +-
source/common/x86/pixel-a.asm | 453 +++++++++++++++++++++++++++++++++-
source/common/x86/pixel.h | 7 +
source/common/x86/pixeladd8.asm | 249 ++++++++++++++++++
source/test/mbdstharness.cpp | 82 ++++-
source/test/mbdstharness.h | 3 +-
source/test/pixelharness.cpp | 4 +-
14 files changed, 933 insertions(+), 416 deletions(-)
diffs (truncated from 1664 to 300 lines):
diff -r 5009254d3d3a -r 1c74d7bfd007 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp Fri Nov 22 00:17:46 2013 -0600
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp Fri Nov 22 10:18:18 2013 -0600
@@ -409,8 +409,21 @@ void TComTrQuant::invtransformNxN(bool t
int rem = m_qpParam.m_rem;
bool useScalingList = getUseScalingList();
uint32_t log2TrSize = g_convertToBit[width] + 2;
+ int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize;
+ int shift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
int32_t *dequantCoef = getDequantCoeff(scalingListType, m_qpParam.m_rem, log2TrSize - 2);
- primitives.dequant(coeff, m_tmpCoeff, width, height, per, rem, useScalingList, log2TrSize, dequantCoef);
+
+ if (!useScalingList)
+ {
+ static const int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
+ int scale = invQuantScales[rem] << per;
+ primitives.dequant_normal(coeff, m_tmpCoeff, width * height, scale, shift);
+ }
+ else
+ {
+ // CHECK_ME: the code is not verify since this is DEAD path
+ primitives.dequant_scaling(coeff, dequantCoef, m_tmpCoeff, width * height, per, shift);
+ }
if (useTransformSkip == true)
{
diff -r 5009254d3d3a -r 1c74d7bfd007 source/Lib/TLibCommon/TComYuv.cpp
--- a/source/Lib/TLibCommon/TComYuv.cpp Fri Nov 22 00:17:46 2013 -0600
+++ b/source/Lib/TLibCommon/TComYuv.cpp Fri Nov 22 10:18:18 2013 -0600
@@ -395,14 +395,14 @@ void TComYuv::copyPartToPartChroma(TShor
void TComYuv::addClip(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize)
{
- addClipLuma(srcYuv0, srcYuv1, trUnitIdx, partSize);
- addClipChroma(srcYuv0, srcYuv1, trUnitIdx, partSize >> m_hChromaShift);
+ int part = partitionFromSizes(partSize, partSize);
+
+ addClipLuma(srcYuv0, srcYuv1, trUnitIdx, partSize, part);
+ addClipChroma(srcYuv0, srcYuv1, trUnitIdx, partSize >> m_hChromaShift, part);
}
-void TComYuv::addClipLuma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize)
+void TComYuv::addClipLuma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize, uint32_t part)
{
- int x, y;
-
Pel* src0 = srcYuv0->getLumaAddr(trUnitIdx, partSize);
int16_t* src1 = srcYuv1->getLumaAddr(trUnitIdx, partSize);
Pel* dst = getLumaAddr(trUnitIdx, partSize);
@@ -411,23 +411,11 @@ void TComYuv::addClipLuma(TComYuv* srcYu
uint32_t src1Stride = srcYuv1->m_width;
uint32_t dststride = getStride();
- for (y = partSize - 1; y >= 0; y--)
- {
- for (x = partSize - 1; x >= 0; x--)
- {
- dst[x] = ClipY(static_cast<int16_t>(src0[x]) + src1[x]);
- }
-
- src0 += src0Stride;
- src1 += src1Stride;
- dst += dststride;
- }
+ primitives.luma_add_ps[part](dst, dststride, src0, src1, src0Stride, src1Stride);
}
-void TComYuv::addClipChroma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize)
+void TComYuv::addClipChroma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize, uint32_t part)
{
- int x, y;
-
Pel* srcU0 = srcYuv0->getCbAddr(trUnitIdx, partSize);
int16_t* srcU1 = srcYuv1->getCbAddr(trUnitIdx, partSize);
Pel* srcV0 = srcYuv0->getCrAddr(trUnitIdx, partSize);
@@ -439,21 +427,8 @@ void TComYuv::addClipChroma(TComYuv* src
uint32_t src1Stride = srcYuv1->m_cwidth;
uint32_t dststride = getCStride();
- for (y = partSize - 1; y >= 0; y--)
- {
- for (x = partSize - 1; x >= 0; x--)
- {
- dstU[x] = ClipC(static_cast<int16_t>(srcU0[x]) + srcU1[x]);
- dstV[x] = ClipC(static_cast<int16_t>(srcV0[x]) + srcV1[x]);
- }
-
- srcU0 += src0Stride;
- srcU1 += src1Stride;
- srcV0 += src0Stride;
- srcV1 += src1Stride;
- dstU += dststride;
- dstV += dststride;
- }
+ primitives.chroma[m_csp].add_ps[part](dstU, dststride, srcU0, srcU1, src0Stride, src1Stride);
+ primitives.chroma[m_csp].add_ps[part](dstV, dststride, srcV0, srcV1, src0Stride, src1Stride);
}
void TComYuv::subtract(TComYuv* srcYuv0, TComYuv* srcYuv1, uint32_t trUnitIdx, uint32_t partSize)
diff -r 5009254d3d3a -r 1c74d7bfd007 source/Lib/TLibCommon/TComYuv.h
--- a/source/Lib/TLibCommon/TComYuv.h Fri Nov 22 00:17:46 2013 -0600
+++ b/source/Lib/TLibCommon/TComYuv.h Fri Nov 22 10:18:18 2013 -0600
@@ -153,8 +153,8 @@ public:
// Clip(srcYuv0 + srcYuv1) -> m_apiBuf
void addClip(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize);
- void addClipLuma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize);
- void addClipChroma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize);
+ void addClipLuma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize, uint32_t part);
+ void addClipChroma(TComYuv* srcYuv0, TShortYUV* srcYuv1, uint32_t trUnitIdx, uint32_t partSize, uint32_t part);
// srcYuv0 - srcYuv1 -> m_apiBuf
void subtract(TComYuv* srcYuv0, TComYuv* srcYuv1, uint32_t trUnitIdx, uint32_t partSize);
diff -r 5009254d3d3a -r 1c74d7bfd007 source/common/dct.cpp
--- a/source/common/dct.cpp Fri Nov 22 00:17:46 2013 -0600
+++ b/source/common/dct.cpp Fri Nov 22 10:18:18 2013 -0600
@@ -718,57 +718,52 @@ void idct32_c(int32_t *src, int16_t *dst
}
}
-void dequant_c(const int32_t* quantCoef, int32_t* coef, int width, int height, int per, int rem, bool useScalingList, unsigned int log2TrSize, int32_t *dequantCoef)
+void dequant_normal_c(const int32_t* quantCoef, int32_t* coef, int num, int scale, int shift)
{
- int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
-
- if (width > 32)
- {
- width = 32;
- height = 32;
- }
+ static const int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
+ assert(num <= 32 * 32);
int add, coeffQ;
- int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize;
- int shift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
int clipQCoef;
- if (useScalingList)
+ add = 1 << (shift - 1);
+
+ for (int n = 0; n < num; n++)
{
- shift += 4;
+ clipQCoef = Clip3(-32768, 32767, quantCoef[n]);
+ coeffQ = (clipQCoef * scale + add) >> shift;
+ coef[n] = Clip3(-32768, 32767, coeffQ);
+ }
+}
- if (shift > per)
+void dequant_scaling_c(const int32_t* quantCoef, const int32_t *deQuantCoef, int32_t* coef, int num, int per, int shift)
+{
+ assert(num <= 32 * 32);
+
+ int add, coeffQ;
+ int clipQCoef;
+
+ shift += 4;
+
+ if (shift > per)
+ {
+ add = 1 << (shift - per - 1);
+
+ for (int n = 0; n < num; n++)
{
- add = 1 << (shift - per - 1);
-
- for (int n = 0; n < width * height; n++)
- {
- clipQCoef = Clip3(-32768, 32767, quantCoef[n]);
- coeffQ = ((clipQCoef * dequantCoef[n]) + add) >> (shift - per);
- coef[n] = Clip3(-32768, 32767, coeffQ);
- }
- }
- else
- {
- for (int n = 0; n < width * height; n++)
- {
- clipQCoef = Clip3(-32768, 32767, quantCoef[n]);
- coeffQ = Clip3(-32768, 32767, clipQCoef * dequantCoef[n]);
- coef[n] = Clip3(-32768, 32767, coeffQ << (per - shift));
- }
+ clipQCoef = Clip3(-32768, 32767, quantCoef[n]);
+ coeffQ = ((clipQCoef * deQuantCoef[n]) + add) >> (shift - per);
+ coef[n] = Clip3(-32768, 32767, coeffQ);
}
}
else
{
- add = 1 << (shift - 1);
- int scale = invQuantScales[rem] << per;
-
- for (int n = 0; n < width * height; n++)
+ for (int n = 0; n < num; n++)
{
clipQCoef = Clip3(-32768, 32767, quantCoef[n]);
- coeffQ = (clipQCoef * scale + add) >> shift;
- coef[n] = Clip3(-32768, 32767, coeffQ);
+ coeffQ = Clip3(-32768, 32767, clipQCoef * deQuantCoef[n]);
+ coef[n] = Clip3(-32768, 32767, coeffQ << (per - shift));
}
}
}
@@ -804,7 +799,8 @@ namespace x265 {
void Setup_C_DCTPrimitives(EncoderPrimitives& p)
{
- p.dequant = dequant_c;
+ p.dequant_scaling = dequant_scaling_c;
+ p.dequant_normal = dequant_normal_c;
p.quant = quant_c;
p.dct[DST_4x4] = dst4_c;
p.dct[DCT_4x4] = dct4_c;
diff -r 5009254d3d3a -r 1c74d7bfd007 source/common/primitives.h
--- a/source/common/primitives.h Fri Nov 22 00:17:46 2013 -0600
+++ b/source/common/primitives.h Fri Nov 22 10:18:18 2013 -0600
@@ -178,8 +178,8 @@ typedef void (*calcresidual_t)(pixel *fe
typedef void (*calcrecon_t)(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
typedef void (*transpose_t)(pixel* dst, pixel* src, intptr_t stride);
typedef uint32_t (*quant_t)(int32_t *coef, int32_t *quantCoeff, int32_t *deltaU, int32_t *qCoef, int qBits, int add, int numCoeff, int32_t* lastPos);
-typedef void (*dequant_t)(const int32_t* src, int32_t* dst, int width, int height, int mcqp_miper, int mcqp_mirem, bool useScalingList,
- unsigned int trSizeLog2, int32_t *dequantCoef);
+typedef void (*dequant_scaling_t)(const int32_t* src, const int32_t *dequantCoef, int32_t* dst, int num, int mcqp_miper, int shift);
+typedef void (*dequant_normal_t)(const int32_t* quantCoef, int32_t* coef, int num, int scale, int shift);
typedef void (*weightp_pp_t)(pixel *src, pixel *dst, intptr_t srcStride, intptr_t dstStride, int width, int height, int w0, int round, int shift, int offset);
typedef void (*weightp_sp_t)(int16_t *src, pixel *dst, intptr_t srcStride, intptr_t dstStride, int width, int height, int w0, int round, int shift, int offset);
@@ -261,7 +261,8 @@ struct EncoderPrimitives
dct_t dct[NUM_DCTS];
idct_t idct[NUM_IDCTS];
quant_t quant;
- dequant_t dequant;
+ dequant_scaling_t dequant_scaling;
+ dequant_normal_t dequant_normal;
calcresidual_t calcresidual[NUM_SQUARE_BLOCKS];
calcrecon_t calcrecon[NUM_SQUARE_BLOCKS];
diff -r 5009254d3d3a -r 1c74d7bfd007 source/common/vec/dct-sse41.cpp
--- a/source/common/vec/dct-sse41.cpp Fri Nov 22 00:17:46 2013 -0600
+++ b/source/common/vec/dct-sse41.cpp Fri Nov 22 10:18:18 2013 -0600
@@ -40,114 +40,103 @@
using namespace x265;
namespace {
-void dequant(const int32_t* quantCoef, int32_t* coef, int width, int height, int per, int rem, bool useScalingList, unsigned int log2TrSize, int32_t *deQuantCoef)
+// TODO: normal and 8bpp dequant have only 16-bits dynamic rang, we can reduce 32-bits multiplication later
+void dequant_normal(const int32_t* quantCoef, int32_t* coef, int num, int scale, int shift)
{
- int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
+ int valueToAdd = 1 << (shift - 1);
+ __m128i vScale = _mm_set1_epi32(scale);
+ __m128i vAdd = _mm_set1_epi32(valueToAdd);
- if (width > 32)
+ for (int n = 0; n < num; n = n + 8)
{
- width = 32;
- height = 32;
+ __m128i quantCoef1, quantCoef2, quantCoef12, sign;
+
+ quantCoef1 = _mm_loadu_si128((__m128i*)(quantCoef + n));
+ quantCoef2 = _mm_loadu_si128((__m128i*)(quantCoef + n + 4));
+
+ quantCoef12 = _mm_packs_epi32(quantCoef1, quantCoef2);
+ sign = _mm_srai_epi16(quantCoef12, 15);
+ quantCoef1 = _mm_unpacklo_epi16(quantCoef12, sign);
+ quantCoef2 = _mm_unpackhi_epi16(quantCoef12, sign);
+
+ quantCoef1 = _mm_sra_epi32(_mm_add_epi32(_mm_mullo_epi32(quantCoef1, vScale), vAdd), _mm_cvtsi32_si128(shift));
+ quantCoef2 = _mm_sra_epi32(_mm_add_epi32(_mm_mullo_epi32(quantCoef2, vScale), vAdd), _mm_cvtsi32_si128(shift));
+
+ quantCoef12 = _mm_packs_epi32(quantCoef1, quantCoef2);
+ sign = _mm_srai_epi16(quantCoef12, 15);
+ quantCoef1 = _mm_unpacklo_epi16(quantCoef12, sign);
+ _mm_storeu_si128((__m128i*)(coef + n), quantCoef1);
+ quantCoef2 = _mm_unpackhi_epi16(quantCoef12, sign);
+ _mm_storeu_si128((__m128i*)(coef + n + 4), quantCoef2);
}
+}
+
+void dequant_scaling(const int32_t* quantCoef, const int32_t *deQuantCoef, int32_t* coef, int num, int per, int shift)
+{
+ assert(num <= 32 * 32);
int valueToAdd;
- int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize;
- int shift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
- if (useScalingList)
+ shift += 4;
+
+ if (shift > per)
{
- shift += 4;
+ valueToAdd = 1 << (shift - per - 1);
+ __m128i IAdd = _mm_set1_epi32(valueToAdd);
- if (shift > per)
More information about the x265-commits
mailing list