[x265-commits] [x265] asm: remove duplicate constant pw_256 and alignment nits
Dnyaneshwar G
dnyaneshwar at multicorewareinc.com
Fri Apr 3 21:26:29 CEST 2015
details: http://hg.videolan.org/x265/rev/dd62c4e924ba
branches:
changeset: 10019:dd62c4e924ba
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Apr 03 10:45:54 2015 +0530
description:
asm: remove duplicate constant pw_256 and alignment nits
Subject: [x265] asm: avx2 code for intrapred_planar16x16
details: http://hg.videolan.org/x265/rev/b95bbc82cc58
branches:
changeset: 10020:b95bbc82cc58
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Apr 03 11:11:47 2015 +0530
description:
asm: avx2 code for intrapred_planar16x16
AVX2:
intra_planar_16x16 16.24x 583.48 9475.36
SSE4:
intra_planar_16x16 11.54x 820.01 9466.91
Subject: [x265] asm: avx2 code for intra_planar_32x32
details: http://hg.videolan.org/x265/rev/d23e5e9d6dd0
branches:
changeset: 10021:d23e5e9d6dd0
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Apr 03 11:35:53 2015 +0530
description:
asm: avx2 code for intra_planar_32x32
AVX2:
intra_planar_32x32 19.93x 1813.34 36132.20
SSE4:
intra_planar_32x32 12.25x 2951.42 36140.76
Subject: [x265] asm: avx2 code for intra_dc_32x32
details: http://hg.videolan.org/x265/rev/aa565f72955c
branches:
changeset: 10022:aa565f72955c
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Fri Apr 03 15:41:49 2015 +0530
description:
asm: avx2 code for intra_dc_32x32
AVX2:
intra_dc_32x32[f=0] 23.17x 435.66 10093.78
SSE4:
intra_dc_32x32[f=0] 14.36x 703.46 10100.78
Subject: [x265] asm: intra_pred_ang4_17 improved by ~57% over SSE4
details: http://hg.videolan.org/x265/rev/38884a963301
branches:
changeset: 10023:38884a963301
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 12:03:59 2015 +0530
description:
asm: intra_pred_ang4_17 improved by ~57% over SSE4
AVX2:
intra_ang_4x4[17] 11.06x 104.22 1152.57
SSE4:
intra_ang_4x4[17] 4.70x 244.43 1148.92
Subject: [x265] asm: intra_pred_ang4_16 improved by ~49% over SSE4
details: http://hg.videolan.org/x265/rev/cd2577b482ae
branches:
changeset: 10024:cd2577b482ae
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 12:19:15 2015 +0530
description:
asm: intra_pred_ang4_16 improved by ~49% over SSE4
AVX2:
intra_ang_4x4[16] 10.86x 104.30 1133.09
SSE4:
intra_ang_4x4[16] 5.51x 206.89 1139.52
Subject: [x265] asm: intra_pred_ang4_15 improved by ~53% over SSE4
details: http://hg.videolan.org/x265/rev/8119b549ca9e
branches:
changeset: 10025:8119b549ca9e
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 12:37:40 2015 +0530
description:
asm: intra_pred_ang4_15 improved by ~53% over SSE4
AVX2:
intra_ang_4x4[15] 10.93x 104.25 1140.00
SSE4:
intra_ang_4x4[15] 4.98x 225.91 1125.26
Subject: [x265] asm: intra_pred_ang4_14 improved by ~43% over SSE4
details: http://hg.videolan.org/x265/rev/d240ff7beda2
branches:
changeset: 10026:d240ff7beda2
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 12:51:29 2015 +0530
description:
asm: intra_pred_ang4_14 improved by ~43% over SSE4
AVX2:
intra_ang_4x4[14] 10.94x 102.94 1126.27
SSE4:
intra_ang_4x4[14] 6.14x 182.91 1122.57
Subject: [x265] asm: intra_pred_ang4_13 improved by ~43% over SSE4
details: http://hg.videolan.org/x265/rev/ba4e530b68a2
branches:
changeset: 10027:ba4e530b68a2
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 13:52:26 2015 +0530
description:
asm: intra_pred_ang4_13 improved by ~43% over SSE4
AVX2:
intra_ang_4x4[13] 10.73x 104.23 1118.51
SSE4:
intra_ang_4x4[13] 6.06x 184.99 1121.24
Subject: [x265] asm: intra_pred_ang4_12 improved by ~35% over SSE4
details: http://hg.videolan.org/x265/rev/e68a5442024e
branches:
changeset: 10028:e68a5442024e
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 14:03:40 2015 +0530
description:
asm: intra_pred_ang4_12 improved by ~35% over SSE4
AVX2:
intra_ang_4x4[12] 10.62x 104.55 1110.68
SSE4:
intra_ang_4x4[12] 6.84x 162.34 1110.04
Subject: [x265] asm: intra_pred_ang4_11 improved by ~31% over SSE4
details: http://hg.videolan.org/x265/rev/9ec24afd357f
branches:
changeset: 10029:9ec24afd357f
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 14:41:25 2015 +0530
description:
asm: intra_pred_ang4_11 improved by ~31% over SSE4
AVX2:
intra_ang_4x4[11] 10.58x 104.21 1102.93
SSE4:
intra_ang_4x4[11] 7.23x 152.13 1100.52
Subject: [x265] asm: intra_pred_ang4_9 improved by ~35% over SSE4
details: http://hg.videolan.org/x265/rev/31ce52f6cc0e
branches:
changeset: 10030:31ce52f6cc0e
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 14:55:56 2015 +0530
description:
asm: intra_pred_ang4_9 improved by ~35% over SSE4
AVX2:
intra_ang_4x4[ 9] 10.27x 104.54 1073.82
SSE4:
intra_ang_4x4[ 9] 6.48x 162.27 1051.73
Subject: [x265] asm: reduce code size with macro 'INTRA_PRED_TRANS_STORE_4x4'
details: http://hg.videolan.org/x265/rev/942267525eb6
branches:
changeset: 10031:942267525eb6
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Apr 03 15:08:07 2015 +0530
description:
asm: reduce code size with macro 'INTRA_PRED_TRANS_STORE_4x4'
Subject: [x265] asm: general calSign to accelerate sao
details: http://hg.videolan.org/x265/rev/4f3dfbfa5abd
branches:
changeset: 10032:4f3dfbfa5abd
user: Min Chen <chenm003 at 163.com>
date: Fri Apr 03 19:10:07 2015 +0800
description:
asm: general calSign to accelerate sao
---
source/common/x86/const-a.asm | 3 ++
source/common/x86/loopfilter.asm | 69 ++++++++++++++++++++++++++-----------
source/encoder/sao.cpp | 14 ++------
source/test/pixelharness.cpp | 8 ++--
4 files changed, 58 insertions(+), 36 deletions(-)
Subject: [x265] asm: reduce 1 register in quant_avx2
details: http://hg.videolan.org/x265/rev/bb526a6863d9
branches:
changeset: 10033:bb526a6863d9
user: Min Chen <chenm003 at 163.com>
date: Fri Apr 03 19:10:12 2015 +0800
description:
asm: reduce 1 register in quant_avx2
Subject: [x265] improve fillReferenceSamples by merge pixel fill
details: http://hg.videolan.org/x265/rev/6c759724db1e
branches:
changeset: 10034:6c759724db1e
user: Min Chen <chenm003 at 163.com>
date: Fri Apr 03 19:10:15 2015 +0800
description:
improve fillReferenceSamples by merge pixel fill
Subject: [x265] primivites: rename luma_p2s to convert_p2s and move into PU
details: http://hg.videolan.org/x265/rev/ac4af23cbdea
branches:
changeset: 10035:ac4af23cbdea
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 18:18:48 2015 +0530
description:
primivites: rename luma_p2s to convert_p2s and move into PU
Subject: [x265] asm: sse4 8bpp code for convert_p2s[4xN]
details: http://hg.videolan.org/x265/rev/d866ce0b50ad
branches:
changeset: 10036:d866ce0b50ad
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 18:32:44 2015 +0530
description:
asm: sse4 8bpp code for convert_p2s[4xN]
convert_p2s[4x4](2.95x), convert_p2s[4x8](3.22x), convert_p2s[4x16](3.59x)
Subject: [x265] asm: ssse3 8bpp code for convert_p2s[8xN],convert_p2s[16xN]
details: http://hg.videolan.org/x265/rev/04ea107e7f41
branches:
changeset: 10037:04ea107e7f41
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 18:38:23 2015 +0530
description:
asm: ssse3 8bpp code for convert_p2s[8xN],convert_p2s[16xN]
convert_p2s[8x4](4.15x), convert_p2s[8x8](4.87x), convert_p2s[8x16](5.57x),
convert_p2s[8x32](5.71x), convert_p2s[16x4](9.48x),convert_p2s[16x8](11.68x),
convert_p2s[16x12](12.47x), convert_p2s[16x16](12.77x),
convert_p2s[16x32](13.26x), convert_p2s[16x64](12.68x)
Subject: [x265] asm: ssse3 8bpp code for convert_p2s[32xN],[64xN]
details: http://hg.videolan.org/x265/rev/02c97d95802d
branches:
changeset: 10038:02c97d95802d
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 18:41:41 2015 +0530
description:
asm: ssse3 8bpp code for convert_p2s[32xN],[64xN]
convert_p2s[32x8](10.45x), convert_p2s[32x16](10.22x),
convert_p2s[32x24](10.98x), convert_p2s[32x32](10.17x),
convert_p2s[32x64](12.31x), convert_p2s[64x16](10.29x),
convert_p2s[64x32](10.17x), convert_p2s[64x48](10.05x),
convert_p2s[64x64](10.04x)
Subject: [x265] asm: ssse3 code for chroma_p2s for i420, i422, i444, reuse the luma code
details: http://hg.videolan.org/x265/rev/a77cb2b78a12
branches:
changeset: 10039:a77cb2b78a12
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 18:50:00 2015 +0530
description:
asm: ssse3 code for chroma_p2s for i420, i422, i444, reuse the luma code
Subject: [x265] asm: sse4 chroma_p2s[4x2](2.29x), ssse3 chroma_p2s[8x2](3.60x) for i420
details: http://hg.videolan.org/x265/rev/3473c9fec18c
branches:
changeset: 10040:3473c9fec18c
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Fri Apr 03 19:02:06 2015 +0530
description:
asm: sse4 chroma_p2s[4x2](2.29x), ssse3 chroma_p2s[8x2](3.60x) for i420
Subject: [x265] asm: only 4:4:4 chroma 4-tap filters are configured in asm-primitives.cpp
details: http://hg.videolan.org/x265/rev/57c3306a7773
branches:
changeset: 10041:57c3306a7773
user: Steve Borho <steve at borho.org>
date: Fri Apr 03 11:16:04 2015 -0500
description:
asm: only 4:4:4 chroma 4-tap filters are configured in asm-primitives.cpp
all the other 4:4:4 chroma primitives are configured by setupAliasPrimitives()
(aliased to luma CU and PU primitives)
Subject: [x265] cmake: avoid strict-overflow warnings in slicetype.cpp from GCC 4.9
details: http://hg.videolan.org/x265/rev/1e47a8d8c226
branches:
changeset: 10042:1e47a8d8c226
user: Steve Borho <steve at borho.org>
date: Fri Apr 03 12:08:21 2015 -0500
description:
cmake: avoid strict-overflow warnings in slicetype.cpp from GCC 4.9
C:\mcw\x265\source\encoder\slicetype.cpp: In member function 'void x265::Lookahead::slicetypeAnalyse(x265::Lowres**, bool)':
C:\mcw\x265\source\encoder\slicetype.cpp:1919:31: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
bDoSearch[0] = p0 < b && fenc->lowresMvs[0][b - p0 - 1][0].x == 0x7FFF;
^
and one other in an X265_CHECK statement. In this case, p0 and b are known to
have small positive values and so the logic is ok.
Subject: [x265] api: make x265_cleanup() a NOP if an encoder is still open
details: http://hg.videolan.org/x265/rev/96fef6b58853
branches:
changeset: 10043:96fef6b58853
user: Steve Borho <steve at borho.org>
date: Fri Apr 03 13:27:08 2015 -0500
description:
api: make x265_cleanup() a NOP if an encoder is still open
diffstat:
source/CMakeLists.txt | 1 +
source/common/ipfilter.cpp | 36 +-
source/common/param.cpp | 2 +-
source/common/predict.cpp | 31 +-
source/common/primitives.cpp | 3 +-
source/common/primitives.h | 9 +-
source/common/x86/asm-primitives.cpp | 99 +++-
source/common/x86/const-a.asm | 155 +++---
source/common/x86/intrapred.h | 11 +
source/common/x86/intrapred8.asm | 319 ++++++++++++++
source/common/x86/ipfilter8.asm | 747 +++++++++++++++++++++-------------
source/common/x86/ipfilter8.h | 58 +-
source/common/x86/loopfilter.asm | 63 ++-
source/common/x86/pixel-util8.asm | 6 +-
source/encoder/CMakeLists.txt | 6 +-
source/encoder/api.cpp | 9 +-
source/encoder/sao.cpp | 14 +-
source/test/ipfilterharness.cpp | 122 +----
source/test/ipfilterharness.h | 1 -
source/test/pixelharness.cpp | 8 +-
20 files changed, 1103 insertions(+), 597 deletions(-)
diffs (truncated from 2379 to 300 lines):
diff -r 9a5fa67583fe -r 96fef6b58853 source/CMakeLists.txt
--- a/source/CMakeLists.txt Thu Apr 02 13:21:32 2015 -0500
+++ b/source/CMakeLists.txt Fri Apr 03 13:27:08 2015 -0500
@@ -196,6 +196,7 @@ if(GCC)
add_definitions(-static)
list(APPEND LINKER_OPTIONS "-static")
endif(STATIC_LINK_CRT)
+ check_cxx_compiler_flag(-Wno-strict-overflow CC_HAS_NO_STRICT_OVERFLOW)
check_cxx_compiler_flag(-Wno-narrowing CC_HAS_NO_NARROWING)
check_cxx_compiler_flag(-Wno-array-bounds CC_HAS_NO_ARRAY_BOUNDS)
if (CC_HAS_NO_ARRAY_BOUNDS)
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/ipfilter.cpp Fri Apr 03 13:27:08 2015 -0500
@@ -34,27 +34,8 @@ using namespace x265;
#endif
namespace {
-template<int dstStride, int width, int height>
-void pixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst)
-{
- int shift = IF_INTERNAL_PREC - X265_DEPTH;
- int row, col;
-
- for (row = 0; row < height; row++)
- {
- for (col = 0; col < width; col++)
- {
- int16_t val = src[col] << shift;
- dst[col] = val - (int16_t)IF_INTERNAL_OFFS;
- }
-
- src += srcStride;
- dst += dstStride;
- }
-}
-
-template<int dstStride>
-void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height)
+template<int width, int height>
+void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride)
{
int shift = IF_INTERNAL_PREC - X265_DEPTH;
int row, col;
@@ -398,7 +379,7 @@ namespace x265 {
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
- p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>;
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
#define CHROMA_422(W, H) \
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
@@ -407,7 +388,7 @@ namespace x265 {
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
- p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
#define CHROMA_444(W, H) \
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
@@ -416,7 +397,7 @@ namespace x265 {
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
- p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>;
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
#define LUMA(W, H) \
p.pu[LUMA_ ## W ## x ## H].luma_hpp = interp_horiz_pp_c<8, W, H>; \
@@ -426,7 +407,7 @@ namespace x265 {
p.pu[LUMA_ ## W ## x ## H].luma_vsp = interp_vert_sp_c<8, W, H>; \
p.pu[LUMA_ ## W ## x ## H].luma_vss = interp_vert_ss_c<8, W, H>; \
p.pu[LUMA_ ## W ## x ## H].luma_hvpp = interp_hv_pp_c<8, W, H>; \
- p.pu[LUMA_ ## W ## x ## H].filter_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>
+ p.pu[LUMA_ ## W ## x ## H].convert_p2s = filterPixelToShort_c<W, H>;
void setupFilterPrimitives_c(EncoderPrimitives& p)
{
@@ -530,11 +511,6 @@ void setupFilterPrimitives_c(EncoderPrim
CHROMA_444(48, 64);
CHROMA_444(64, 16);
CHROMA_444(16, 64);
- p.luma_p2s = filterPixelToShort_c<MAX_CU_SIZE>;
-
- p.chroma[X265_CSP_I444].p2s = filterPixelToShort_c<MAX_CU_SIZE>;
- p.chroma[X265_CSP_I420].p2s = filterPixelToShort_c<MAX_CU_SIZE / 2>;
- p.chroma[X265_CSP_I422].p2s = filterPixelToShort_c<MAX_CU_SIZE / 2>;
p.extendRowBorder = extendCURowColBorder;
}
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/param.cpp
--- a/source/common/param.cpp Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/param.cpp Fri Apr 03 13:27:08 2015 -0500
@@ -1183,7 +1183,7 @@ int x265_set_globals(x265_param* param)
uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
- if (g_ctuSizeConfigured || ATOMIC_INC(&g_ctuSizeConfigured) > 1)
+ if (ATOMIC_INC(&g_ctuSizeConfigured) > 1)
{
if (g_maxCUSize != param->maxCUSize)
{
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/predict.cpp
--- a/source/common/predict.cpp Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/predict.cpp Fri Apr 03 13:27:08 2015 -0500
@@ -273,7 +273,7 @@ void Predict::predInterLumaPixel(const P
void Predict::predInterLumaShort(const PredictionUnit& pu, ShortYuv& dstSYuv, const PicYuv& refPic, const MV& mv) const
{
int16_t* dst = dstSYuv.getLumaAddr(pu.puAbsPartIdx);
- int dstStride = dstSYuv.m_size;
+ intptr_t dstStride = dstSYuv.m_size;
intptr_t srcStride = refPic.m_stride;
intptr_t srcOffset = (mv.x >> 2) + (mv.y >> 2) * srcStride;
@@ -288,7 +288,7 @@ void Predict::predInterLumaShort(const P
X265_CHECK(dstStride == MAX_CU_SIZE, "stride expected to be max cu size\n");
if (!(yFrac | xFrac))
- primitives.luma_p2s(src, srcStride, dst, pu.width, pu.height);
+ primitives.pu[partEnum].convert_p2s(src, srcStride, dst, dstStride);
else if (!yFrac)
primitives.pu[partEnum].luma_hps(src, srcStride, dst, dstStride, xFrac, 0);
else if (!xFrac)
@@ -375,14 +375,13 @@ void Predict::predInterChromaShort(const
int partEnum = partitionFromSizes(pu.width, pu.height);
uint32_t cxWidth = pu.width >> m_hChromaShift;
- uint32_t cxHeight = pu.height >> m_vChromaShift;
- X265_CHECK(((cxWidth | cxHeight) % 2) == 0, "chroma block size expected to be multiple of 2\n");
+ X265_CHECK(((cxWidth | (pu.height >> m_vChromaShift)) % 2) == 0, "chroma block size expected to be multiple of 2\n");
if (!(yFrac | xFrac))
{
- primitives.chroma[m_csp].p2s(refCb, refStride, dstCb, cxWidth, cxHeight);
- primitives.chroma[m_csp].p2s(refCr, refStride, dstCr, cxWidth, cxHeight);
+ primitives.chroma[m_csp].pu[partEnum].p2s(refCb, refStride, dstCb, dstStride);
+ primitives.chroma[m_csp].pu[partEnum].p2s(refCr, refStride, dstCr, dstStride);
}
else if (!yFrac)
{
@@ -817,7 +816,9 @@ void Predict::fillReferenceSamples(const
const pixel refSample = *pAdiLineNext;
// Pad unavailable samples with new value
int nextOrTop = X265_MIN(next, leftUnits);
+
// fill left column
+#if HIGH_BIT_DEPTH
while (curr < nextOrTop)
{
for (int i = 0; i < unitHeight; i++)
@@ -836,6 +837,24 @@ void Predict::fillReferenceSamples(const
adi += unitWidth;
curr++;
}
+#else
+ X265_CHECK(curr <= nextOrTop, "curr must be less than or equal to nextOrTop\n");
+ if (curr < nextOrTop)
+ {
+ const int fillSize = unitHeight * (nextOrTop - curr);
+ memset(adi, refSample, fillSize * sizeof(pixel));
+ curr = nextOrTop;
+ adi += fillSize;
+ }
+
+ if (curr < next)
+ {
+ const int fillSize = unitWidth * (next - curr);
+ memset(adi, refSample, fillSize * sizeof(pixel));
+ curr = next;
+ adi += fillSize;
+ }
+#endif
}
// pad all other reference samples.
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/primitives.cpp
--- a/source/common/primitives.cpp Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/primitives.cpp Fri Apr 03 13:27:08 2015 -0500
@@ -90,7 +90,6 @@ void setupAliasPrimitives(EncoderPrimiti
/* alias chroma 4:4:4 from luma primitives (all but chroma filters) */
- p.chroma[X265_CSP_I444].p2s = p.luma_p2s;
p.chroma[X265_CSP_I444].cu[BLOCK_4x4].sa8d = NULL;
for (int i = 0; i < NUM_PU_SIZES; i++)
@@ -98,7 +97,7 @@ void setupAliasPrimitives(EncoderPrimiti
p.chroma[X265_CSP_I444].pu[i].copy_pp = p.pu[i].copy_pp;
p.chroma[X265_CSP_I444].pu[i].addAvg = p.pu[i].addAvg;
p.chroma[X265_CSP_I444].pu[i].satd = p.pu[i].satd;
- p.chroma[X265_CSP_I444].pu[i].chroma_p2s = p.pu[i].filter_p2s;
+ p.chroma[X265_CSP_I444].pu[i].p2s = p.pu[i].convert_p2s;
}
for (int i = 0; i < NUM_CU_SIZES; i++)
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/primitives.h
--- a/source/common/primitives.h Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/primitives.h Fri Apr 03 13:27:08 2015 -0500
@@ -156,8 +156,7 @@ typedef void (*filter_ps_t) (const pixel
typedef void (*filter_sp_t) (const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx);
typedef void (*filter_ss_t) (const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx);
typedef void (*filter_hv_pp_t) (const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int idxX, int idxY);
-typedef void (*filter_p2s_wxh_t)(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height);
-typedef void (*filter_p2s_t)(const pixel* src, intptr_t srcStride, int16_t* dst);
+typedef void (*filter_p2s_t)(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
typedef void (*copy_pp_t)(pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); // dst is aligned
typedef void (*copy_sp_t)(pixel* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride);
@@ -211,7 +210,7 @@ struct EncoderPrimitives
addAvg_t addAvg; // bidir motion compensation, uses 16bit values
copy_pp_t copy_pp;
- filter_p2s_t filter_p2s;
+ filter_p2s_t convert_p2s;
}
pu[NUM_PU_SIZES];
@@ -290,7 +289,6 @@ struct EncoderPrimitives
weightp_sp_t weight_sp;
weightp_pp_t weight_pp;
- filter_p2s_wxh_t luma_p2s;
findPosLast_t findPosLast;
@@ -317,7 +315,7 @@ struct EncoderPrimitives
filter_hps_t filter_hps;
addAvg_t addAvg;
copy_pp_t copy_pp;
- filter_p2s_t chroma_p2s;
+ filter_p2s_t p2s;
}
pu[NUM_PU_SIZES];
@@ -337,7 +335,6 @@ struct EncoderPrimitives
}
cu[NUM_CU_SIZES];
- filter_p2s_wxh_t p2s; // takes width/height as arguments
}
chroma[X265_CSP_COUNT];
};
diff -r 9a5fa67583fe -r 96fef6b58853 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Thu Apr 02 13:21:32 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Fri Apr 03 13:27:08 2015 -0500
@@ -859,9 +859,6 @@ void setupAssemblyPrimitives(EncoderPrim
PIXEL_AVG_W4(mmx2);
LUMA_VAR(sse2);
- p.luma_p2s = x265_luma_p2s_sse2;
- p.chroma[X265_CSP_I420].p2s = x265_chroma_p2s_sse2;
- p.chroma[X265_CSP_I422].p2s = x265_chroma_p2s_sse2;
ALL_LUMA_TU(blockfill_s, blockfill_s, sse2);
ALL_LUMA_TU_S(cpy1Dto2D_shr, cpy1Dto2D_shr_, sse2);
@@ -1273,31 +1270,6 @@ void setupAssemblyPrimitives(EncoderPrim
ASSIGN_SSE_PP(ssse3);
p.cu[BLOCK_4x4].sse_pp = x265_pixel_ssd_4x4_ssse3;
p.chroma[X265_CSP_I422].cu[BLOCK_422_4x8].sse_pp = x265_pixel_ssd_4x8_ssse3;
- p.pu[LUMA_4x4].filter_p2s = x265_pixelToShort_4x4_ssse3;
- p.pu[LUMA_4x8].filter_p2s = x265_pixelToShort_4x8_ssse3;
- p.pu[LUMA_4x16].filter_p2s = x265_pixelToShort_4x16_ssse3;
- p.pu[LUMA_8x4].filter_p2s = x265_pixelToShort_8x4_ssse3;
- p.pu[LUMA_8x8].filter_p2s = x265_pixelToShort_8x8_ssse3;
- p.pu[LUMA_8x16].filter_p2s = x265_pixelToShort_8x16_ssse3;
- p.pu[LUMA_8x32].filter_p2s = x265_pixelToShort_8x32_ssse3;
- p.pu[LUMA_16x4].filter_p2s = x265_pixelToShort_16x4_ssse3;
- p.pu[LUMA_16x8].filter_p2s = x265_pixelToShort_16x8_ssse3;
- p.pu[LUMA_16x12].filter_p2s = x265_pixelToShort_16x12_ssse3;
- p.pu[LUMA_16x16].filter_p2s = x265_pixelToShort_16x16_ssse3;
- p.pu[LUMA_16x32].filter_p2s = x265_pixelToShort_16x32_ssse3;
- p.pu[LUMA_16x64].filter_p2s = x265_pixelToShort_16x64_ssse3;
- p.pu[LUMA_32x8].filter_p2s = x265_pixelToShort_32x8_ssse3;
- p.pu[LUMA_32x16].filter_p2s = x265_pixelToShort_32x16_ssse3;
- p.pu[LUMA_32x24].filter_p2s = x265_pixelToShort_32x24_ssse3;
- p.pu[LUMA_32x32].filter_p2s = x265_pixelToShort_32x32_ssse3;
- p.pu[LUMA_32x64].filter_p2s = x265_pixelToShort_32x64_ssse3;
- p.pu[LUMA_64x16].filter_p2s = x265_pixelToShort_64x16_ssse3;
- p.pu[LUMA_64x32].filter_p2s = x265_pixelToShort_64x32_ssse3;
- p.pu[LUMA_64x48].filter_p2s = x265_pixelToShort_64x48_ssse3;
- p.pu[LUMA_64x64].filter_p2s = x265_pixelToShort_64x64_ssse3;
-
- p.chroma[X265_CSP_I420].p2s = x265_chroma_p2s_ssse3;
- p.chroma[X265_CSP_I422].p2s = x265_chroma_p2s_ssse3;
p.dst4x4 = x265_dst4_ssse3;
p.cu[BLOCK_8x8].idct = x265_idct8_ssse3;
@@ -1307,6 +1279,52 @@ void setupAssemblyPrimitives(EncoderPrim
p.frameInitLowres = x265_frame_init_lowres_core_ssse3;
p.scale1D_128to64 = x265_scale1D_128to64_ssse3;
p.scale2D_64to32 = x265_scale2D_64to32_ssse3;
+
+ p.pu[LUMA_8x4].convert_p2s = x265_filterPixelToShort_8x4_ssse3;
+ p.pu[LUMA_8x8].convert_p2s = x265_filterPixelToShort_8x8_ssse3;
+ p.pu[LUMA_8x16].convert_p2s = x265_filterPixelToShort_8x16_ssse3;
+ p.pu[LUMA_8x32].convert_p2s = x265_filterPixelToShort_8x32_ssse3;
+ p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_ssse3;
More information about the x265-commits
mailing list