[x265-commits] [x265] asm: the pixel value in blockcopy_ps is saturation by cal...
Min Chen
chenm003 at 163.com
Thu Nov 7 20:44:55 CET 2013
details: http://hg.videolan.org/x265/rev/0a1b379be359
branches:
changeset: 4918:0a1b379be359
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 07 18:17:52 2013 +0800
description:
asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb
Subject: [x265] tcompicyuv: improvement for Extend the right if width is not multiple of min CU size
details: http://hg.videolan.org/x265/rev/85002898f5b4
branches:
changeset: 4919:85002898f5b4
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Thu Nov 07 14:31:05 2013 +0530
description:
tcompicyuv: improvement for Extend the right if width is not multiple of min CU size
Subject: [x265] asm: assembly code for pixel_sad_x3_48x64
details: http://hg.videolan.org/x265/rev/74682dfe5342
branches:
changeset: 4920:74682dfe5342
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Thu Nov 07 12:25:14 2013 +0530
description:
asm: assembly code for pixel_sad_x3_48x64
Subject: [x265] asm: assembly code for pixel_sad_x4_48x64
details: http://hg.videolan.org/x265/rev/96f1bb63b747
branches:
changeset: 4921:96f1bb63b747
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Thu Nov 07 13:07:18 2013 +0530
description:
asm: assembly code for pixel_sad_x4_48x64
Subject: [x265] asm: assembly code for pixel_sad_x3_64xN
details: http://hg.videolan.org/x265/rev/d6644a32e6bc
branches:
changeset: 4922:d6644a32e6bc
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Thu Nov 07 16:40:20 2013 +0530
description:
asm: assembly code for pixel_sad_x3_64xN
Subject: [x265] asm: assembly code for pixel_sad_x4_64xN
details: http://hg.videolan.org/x265/rev/dc31fc1daf42
branches:
changeset: 4923:dc31fc1daf42
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Thu Nov 07 17:29:52 2013 +0530
description:
asm: assembly code for pixel_sad_x4_64xN
Subject: [x265] pixel: remove last remaining intrinsic SAD primitives
details: http://hg.videolan.org/x265/rev/536db32fc253
branches:
changeset: 4924:536db32fc253
user: Steve Borho <steve at borho.org>
date: Thu Nov 07 12:01:09 2013 -0600
description:
pixel: remove last remaining intrinsic SAD primitives
Subject: [x265] aq: bug fix, extend right and bot of TComPic::m_origPicYuv to a multiple of 16
details: http://hg.videolan.org/x265/rev/93a4f88844f1
branches:
changeset: 4925:93a4f88844f1
user: Aarthi Thirumalai
date: Thu Nov 07 16:46:57 2013 +0530
description:
aq: bug fix, extend right and bot of TComPic::m_origPicYuv to a multiple of 16
Subject: [x265] tcompicyuv: add right boundary padding while applying bottom row padding.
details: http://hg.videolan.org/x265/rev/397a201b0ea3
branches:
changeset: 4926:397a201b0ea3
user: Aarthi Thirumalai
date: Thu Nov 07 17:22:26 2013 +0530
description:
tcompicyuv: add right boundary padding while applying bottom row padding.
Subject: [x265] asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb
details: http://hg.videolan.org/x265/rev/b572831429ec
branches:
changeset: 4927:b572831429ec
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 07 18:17:52 2013 +0800
description:
asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb
Subject: [x265] cleanup: remove unused blockcpy_sc
details: http://hg.videolan.org/x265/rev/db7752a46693
branches:
changeset: 4928:db7752a46693
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 07 18:18:16 2013 +0800
description:
cleanup: remove unused blockcpy_sc
Subject: [x265] Bug fix for luma vpp asm routines.Also incorporated review comment changes.
details: http://hg.videolan.org/x265/rev/9ba49b482a1e
branches:
changeset: 4929:9ba49b482a1e
user: Nabajit Deka
date: Thu Nov 07 21:10:38 2013 +0530
description:
Bug fix for luma vpp asm routines.Also incorporated review comment changes.
Subject: [x265] asm: enable luma_vpp block MC functions
details: http://hg.videolan.org/x265/rev/4d9aac4f0985
branches:
changeset: 4930:4d9aac4f0985
user: Steve Borho <steve at borho.org>
date: Thu Nov 07 12:31:34 2013 -0600
description:
asm: enable luma_vpp block MC functions
Subject: [x265] unit test code for blockfill_s_c function
details: http://hg.videolan.org/x265/rev/12ec248f7390
branches:
changeset: 4931:12ec248f7390
user: Praveen Tiwari
date: Thu Nov 07 18:16:22 2013 +0530
description:
unit test code for blockfill_s_c function
Subject: [x265] asm code for blockfill_s, 4x4
details: http://hg.videolan.org/x265/rev/29d208555299
branches:
changeset: 4932:29d208555299
user: Praveen Tiwari
date: Thu Nov 07 18:26:36 2013 +0530
description:
asm code for blockfill_s, 4x4
Subject: [x265] asm code for blockfill_s, 8x8
details: http://hg.videolan.org/x265/rev/7d3e461312a5
branches:
changeset: 4933:7d3e461312a5
user: Praveen Tiwari
date: Thu Nov 07 18:59:28 2013 +0530
description:
asm code for blockfill_s, 8x8
Subject: [x265] asm code for blockfill_s, 16x16
details: http://hg.videolan.org/x265/rev/a8df8123e9ab
branches:
changeset: 4934:a8df8123e9ab
user: Praveen Tiwari
date: Thu Nov 07 19:40:51 2013 +0530
description:
asm code for blockfill_s, 16x16
Subject: [x265] asm code for blockfil_s, 32x32
details: http://hg.videolan.org/x265/rev/b4993b1fef7c
branches:
changeset: 4935:b4993b1fef7c
user: Praveen Tiwari
date: Thu Nov 07 20:06:56 2013 +0530
description:
asm code for blockfil_s, 32x32
Subject: [x265] rename: pixelsub_sp to pixelsub_ps, because it sub two Pixel and result is Short
details: http://hg.videolan.org/x265/rev/cb24ed71905d
branches:
changeset: 4936:cb24ed71905d
user: Min Chen <chenm003 at 163.com>
date: Thu Nov 07 13:13:47 2013 +0800
description:
rename: pixelsub_sp to pixelsub_ps, because it sub two Pixel and result is Short
diffstat:
source/Lib/TLibCommon/TComPicYuv.cpp | 239 +--
source/Lib/TLibCommon/TComPrediction.cpp | 2 +-
source/common/TShortYUV.cpp | 6 +-
source/common/lowres.cpp | 25 -
source/common/pixel.cpp | 21 +-
source/common/primitives.h | 5 +-
source/common/vec/blockcopy-sse3.cpp | 5 +-
source/common/vec/pixel-sse41.cpp | 1795 +-----------------------------
source/common/x86/asm-primitives.cpp | 26 +
source/common/x86/blockcopy8.asm | 149 ++
source/common/x86/blockcopy8.h | 5 +
source/common/x86/ipfilter8.asm | 232 +-
source/common/x86/sad-a.asm | 898 +++++++++++++++
source/encoder/frameencoder.cpp | 5 +-
source/encoder/motion.cpp | 2 +-
source/test/pixelharness.cpp | 85 +-
source/test/pixelharness.h | 4 +-
17 files changed, 1378 insertions(+), 2126 deletions(-)
diffs (truncated from 4181 to 300 lines):
diff -r ed1b1a7b0b38 -r cb24ed71905d source/Lib/TLibCommon/TComPicYuv.cpp
--- a/source/Lib/TLibCommon/TComPicYuv.cpp Thu Nov 07 13:05:53 2013 +0530
+++ b/source/Lib/TLibCommon/TComPicYuv.cpp Thu Nov 07 13:13:47 2013 +0800
@@ -323,18 +323,36 @@ void TComPicYuv::dump(char* pFileName, b
//! \}
/* Copy pixels from an input picture (C structure) into internal TComPicYuv instance
- * Upscale pixels from 8bits to 16 bits when required, but do not modify pixels.
- * This new routine is GPL
- */
+ * Upscale pixels from 8bits to 16 bits when required, but do not modify
+ * pixels. */
void TComPicYuv::copyFromPicture(const x265_picture& pic, int32_t *pad)
{
Pel *Y = getLumaAddr();
Pel *U = getCbAddr();
Pel *V = getCrAddr();
+ // m_picWidth is the width that is being encoded, padx indicates how many
+ // of those pixels are padding to reach multiple of MinCU(4) size.
+ //
+ // Internally, we need to extend rows out to a multiple of 16 for lowres
+ // downscale and other operations. But those padding pixels are never
+ // encoded.
+ //
+ // The same applies to m_picHeight and pady
+
int padx = pad[0];
int pady = pad[1];
+ /* width and height - without padsize (input picture raw width and height) */
+ int width = m_picWidth - padx;
+ int height = m_picHeight - pady;
+
+ /* internal pad to multiple of 16x16 blocks */
+ uint8_t rem = m_picWidth & 15;
+ padx = rem ? 16 - rem : padx;
+ rem = m_picHeight & 15;
+ pady = rem ? 16 - rem : pady;
+
#if HIGH_BIT_DEPTH
if (pic.bitDepth > 8)
{
@@ -342,10 +360,6 @@ void TComPicYuv::copyFromPicture(const x
uint16_t *u = (uint16_t*)pic.planes[1];
uint16_t *v = (uint16_t*)pic.planes[2];
- /* width and height - without padsize */
- int width = m_picWidth - padx;
- int height = m_picHeight - pady;
-
// Manually copy pixels to up-size them
for (int r = 0; r < height; r++)
{
@@ -354,6 +368,10 @@ void TComPicYuv::copyFromPicture(const x
Y[c] = (Pel)y[c];
}
+ for (int x = 0; x < padx; x++)
+ {
+ Y[width + x] = Y[width - 1];
+ }
Y += getStride();
y += pic.stride[0];
}
@@ -366,73 +384,42 @@ void TComPicYuv::copyFromPicture(const x
V[c] = (Pel)v[c];
}
+ for (int x = 0; x < padx >> m_hChromaShift; x++)
+ {
+ U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
+ V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
+ }
U += getCStride();
V += getCStride();
u += pic.stride[1];
v += pic.stride[2];
}
- /* Extend the right if width is not multiple of minimum CU size */
-
- if (padx)
- {
- Y = getLumaAddr();
- U = getCbAddr();
- V = getCrAddr();
-
- for (int r = 0; r < height; r++)
- {
- for (int x = 0; x < padx; x++)
- {
- Y[width + x] = Y[width - 1];
- }
-
- Y += getStride();
- }
-
- for (int r = 0; r < height >> m_vChromaShift; r++)
- {
- for (int x = 0; x < padx >> m_hChromaShift; x++)
- {
- U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
- V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
- }
-
- U += getCStride();
- V += getCStride();
- }
- }
-
/* extend the bottom if height is not multiple of the minimum CU size */
if (pady)
{
- width = m_picWidth;
Y = getLumaAddr() + (height - 1) * getStride();
U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
for (uint32_t i = 1; i <= pady; i++)
{
- memcpy(Y + i * getStride(), Y, width * sizeof(Pel));
+ memcpy(Y + i * getStride(), Y, (width + padx) * sizeof(Pel));
}
for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
{
- memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(Pel));
- memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(Pel));
+ memcpy(U + j * getCStride(), U, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
+ memcpy(V + j * getCStride(), V, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
}
}
}
- else if(pic.bitDepth == 8)
+ else
{
uint8_t *y = (uint8_t*)pic.planes[0];
uint8_t *u = (uint8_t*)pic.planes[1];
uint8_t *v = (uint8_t*)pic.planes[2];
- /* width and height - without padsize */
- int width = m_picWidth - padx;
- int height = m_picHeight - pady;
-
// Manually copy pixels to up-size them
for (int r = 0; r < height; r++)
{
@@ -441,6 +428,10 @@ void TComPicYuv::copyFromPicture(const x
Y[c] = (Pel)y[c];
}
+ for (int x = 0; x < padx; x++)
+ {
+ Y[width + x] = Y[width - 1];
+ }
Y += getStride();
y += pic.stride[0];
}
@@ -453,97 +444,10 @@ void TComPicYuv::copyFromPicture(const x
V[c] = (Pel)v[c];
}
- U += getCStride();
- V += getCStride();
- u += pic.stride[1];
- v += pic.stride[2];
- }
-
- /* Extend the right if width is not multiple of minimum CU size */
-
- if (padx)
- {
- Y = getLumaAddr();
- U = getCbAddr();
- V = getCrAddr();
-
- for (int r = 0; r < height; r++)
+ for (int x = 0; x < padx >> m_hChromaShift; x++)
{
- for (int x = 0; x < padx; x++)
- {
- Y[width + x] = Y[width - 1];
- }
-
- Y += getStride();
- }
-
- for (int r = 0; r < height >> m_vChromaShift; r++)
- {
- for (int x = 0; x < padx >> m_hChromaShift; x++)
- {
- U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
- V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
- }
-
- U += getCStride();
- V += getCStride();
- }
- }
-
- /* extend the bottom if height is not multiple of the minimum CU size */
- if (pady)
- {
- width = m_picWidth;
- Y = getLumaAddr() + (height - 1) * getStride();
- U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
- V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
-
- for (uint32_t i = 1; i <= pady; i++)
- {
- memcpy(Y + i * getStride(), Y, width * sizeof(Pel));
- }
-
- for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
- {
- memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(Pel));
- memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(Pel));
- }
- }
- }
- else
-#endif // if HIGH_BIT_DEPTH
- {
- uint8_t *y = (uint8_t*)pic.planes[0];
- uint8_t *u = (uint8_t*)pic.planes[1];
- uint8_t *v = (uint8_t*)pic.planes[2];
-
- /* width and height - without padsize */
- int width = (m_picWidth * (pic.bitDepth > 8 ? 2 : 1)) - padx;
- int height = m_picHeight - pady;
-
- // copy pixels by row into encoder's buffer
- for (int r = 0; r < height; r++)
- {
- memcpy(Y, y, width);
-
- /* extend the right if width is not multiple of the minimum CU size */
- if (padx)
- ::memset(Y + width, Y[width - 1], padx);
-
- Y += getStride();
- y += pic.stride[0];
- }
-
- for (int r = 0; r < height >> m_vChromaShift; r++)
- {
- memcpy(U, u, width >> m_hChromaShift);
- memcpy(V, v, width >> m_hChromaShift);
-
- /* extend the right if width is not multiple of the minimum CU size */
- if (padx)
- {
- ::memset(U + (width >> m_hChromaShift), U[(width >> m_hChromaShift) - 1], padx >> m_hChromaShift);
- ::memset(V + (width >> m_hChromaShift), V[(width >> m_hChromaShift) - 1], padx >> m_hChromaShift);
+ U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
+ V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
}
U += getCStride();
@@ -555,21 +459,74 @@ void TComPicYuv::copyFromPicture(const x
/* extend the bottom if height is not multiple of the minimum CU size */
if (pady)
{
- width = m_picWidth;
Y = getLumaAddr() + (height - 1) * getStride();
U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
for (uint32_t i = 1; i <= pady; i++)
{
- memcpy(Y + i * getStride(), Y, width * sizeof(pixel));
+ memcpy(Y + i * getStride(), Y, (width + padx) * sizeof(Pel));
}
for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
{
- memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(pixel));
- memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(pixel));
+ memcpy(U + j * getCStride(), U, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
+ memcpy(V + j * getCStride(), V, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
}
}
}
+#else // if HIGH_BIT_DEPTH
+ uint8_t *y = (uint8_t*)pic.planes[0];
+ uint8_t *u = (uint8_t*)pic.planes[1];
+ uint8_t *v = (uint8_t*)pic.planes[2];
+
+ for (int r = 0; r < height; r++)
+ {
+ memcpy(Y, y, width);
+
+ /* extend the right if width is not multiple of the minimum CU size */
+ if (padx)
+ ::memset(Y + width, Y[width - 1], padx);
+
+ Y += getStride();
+ y += pic.stride[0];
+ }
More information about the x265-commits
mailing list