[x265-commits] [x265] fix compile warning in pixel.cpp for 422 primitive setup
Ashok Kumar Mishra
ashok at multicorewareinc.com
Wed Apr 16 23:49:04 CEST 2014
details: http://hg.videolan.org/x265/rev/24e8bac645a3
branches:
changeset: 6717:24e8bac645a3
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Wed Apr 16 17:36:13 2014 +0530
description:
fix compile warning in pixel.cpp for 422 primitive setup
Subject: [x265] motion: always include the mvcost returned by motionEstimate [CHANGES OUTPUTS]
details: http://hg.videolan.org/x265/rev/bf40ab3af59a
branches: stable
changeset: 6718:bf40ab3af59a
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 16:29:19 2014 -0500
description:
motion: always include the mvcost returned by motionEstimate [CHANGES OUTPUTS]
This was a rather subtle bug that has been in the code base for some time. The
caller of motionEstimate() will often want to remove the mvcost from the
returned cost value, and in this circumstance it would go negative, and since
the returned value is unsigned it became very large, causing the encoder to
actually discard a zero-residual match.
If the stars were perfectly aligned and all of the reference ME costs became
exactly -1, *all* possible ME candidates were discarded which could lead to
crashes.
Subject: [x265] cmake: use HAVE_ALIGNED_STACK=0 for x86_32 builds, even for GCC
details: http://hg.videolan.org/x265/rev/cfb1bb58d4fe
branches: stable
changeset: 6719:cfb1bb58d4fe
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 13:29:39 2014 -0500
description:
cmake: use HAVE_ALIGNED_STACK=0 for x86_32 builds, even for GCC
In order to enable HAVE_ALIGNED_STACK for 32bit builds, we would need to align
our stack internally at all thread entry points and all API entry points that
might use primitives. 32bit performance is not a high priority for us at the
moment.
This fixes a number of reported crashes on 32bit builds
Subject: [x265] vbv: clear row diagonal and cu SATD costs after vbv row reset was triggered
details: http://hg.videolan.org/x265/rev/03525a77d640
branches: stable
changeset: 6720:03525a77d640
user: Aarthi Thirumalai
date: Tue Apr 15 22:04:21 2014 +0530
description:
vbv: clear row diagonal and cu SATD costs after vbv row reset was triggered
refs #45
Subject: [x265] frameencoder: use m_isReferenced when configuring SAO in compressFrame()
details: http://hg.videolan.org/x265/rev/5746582ff4a6
branches: stable
changeset: 6721:5746582ff4a6
user: Steve Borho <steve at borho.org>
date: Thu Apr 03 14:49:57 2014 -0500
description:
frameencoder: use m_isReferenced when configuring SAO in compressFrame()
In some pessimal situations, the slice's reference state could even be changed
by the time compressFrame() starts. This prevents any race hazard.
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/436c63dd2d24
branches:
changeset: 6722:436c63dd2d24
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 16:32:21 2014 -0500
description:
Merge with stable
Subject: [x265] cmake: nit
details: http://hg.videolan.org/x265/rev/41ef5053e04c
branches:
changeset: 6723:41ef5053e04c
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 13:29:48 2014 -0500
description:
cmake: nit
Subject: [x265] encoder: singleton m_vps nits
details: http://hg.videolan.org/x265/rev/7fd1df6f4db8
branches:
changeset: 6724:7fd1df6f4db8
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 13:30:19 2014 -0500
description:
encoder: singleton m_vps nits
Subject: [x265] TComSlice: initialize m_vps pointer
details: http://hg.videolan.org/x265/rev/818a591c3a6e
branches:
changeset: 6725:818a591c3a6e
user: Steve Borho <steve at borho.org>
date: Wed Apr 16 13:30:35 2014 -0500
description:
TComSlice: initialize m_vps pointer
Subject: [x265] align DCT8's stack to 64-bytes to avoid crash and improve cache performance
details: http://hg.videolan.org/x265/rev/024ca523052f
branches:
changeset: 6726:024ca523052f
user: Min Chen <chenm003 at 163.com>
date: Wed Apr 16 10:49:40 2014 +0800
description:
align DCT8's stack to 64-bytes to avoid crash and improve cache performance
diffstat:
source/Lib/TLibCommon/TComSlice.cpp | 1 +
source/Lib/TLibCommon/TComYuv.cpp | 2 +-
source/cmake/CMakeASM_YASMInformation.cmake | 5 +-
source/common/ipfilter.cpp | 55 ++++++++++--------------
source/common/pixel.cpp | 62 ++++++++++++++--------------
source/common/primitives.h | 15 ++++++-
source/common/shortyuv.cpp | 2 +-
source/common/x86/dct8.asm | 11 ++++-
source/encoder/frameencoder.cpp | 4 +-
source/encoder/motion.cpp | 4 +-
10 files changed, 89 insertions(+), 72 deletions(-)
diffs (truncated from 316 to 300 lines):
diff -r 0b696c7f46f2 -r 024ca523052f source/Lib/TLibCommon/TComSlice.cpp
--- a/source/Lib/TLibCommon/TComSlice.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/Lib/TLibCommon/TComSlice.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -65,6 +65,7 @@ TComSlice::TComSlice()
, m_bReferenced(false)
, m_sps(NULL)
, m_pps(NULL)
+ , m_vps(NULL)
, m_pic(NULL)
, m_colFromL0Flag(1)
, m_colRefIdx(0)
diff -r 0b696c7f46f2 -r 024ca523052f source/Lib/TLibCommon/TComYuv.cpp
--- a/source/Lib/TLibCommon/TComYuv.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/Lib/TLibCommon/TComYuv.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -201,7 +201,7 @@ void TComYuv::copyPartToPartLuma(ShortYu
void TComYuv::copyPartToPartChroma(ShortYuv* dstPicYuv, uint32_t partIdx, uint32_t lumaSize, uint32_t chromaId, const bool splitIntoSubTUs)
{
- int part = splitIntoSubTUs ? NUM_CHROMA_PARTITIONS : partitionFromSizes(lumaSize, lumaSize);
+ int part = splitIntoSubTUs ? NUM_CHROMA_PARTITIONS422 : partitionFromSizes(lumaSize, lumaSize);
assert(lumaSize != 4);
diff -r 0b696c7f46f2 -r 024ca523052f source/cmake/CMakeASM_YASMInformation.cmake
--- a/source/cmake/CMakeASM_YASMInformation.cmake Tue Apr 15 14:07:33 2014 -0500
+++ b/source/cmake/CMakeASM_YASMInformation.cmake Wed Apr 16 10:49:40 2014 +0800
@@ -21,13 +21,14 @@ else()
endif()
endif()
-if (GCC)
+# we cannot assume 16-byte stack alignment on x86_32 even with GCC
+if(GCC AND X64)
set(ASM_FLAGS "${ASM_FLAGS} -DHAVE_ALIGNED_STACK=1")
else()
set(ASM_FLAGS "${ASM_FLAGS} -DHAVE_ALIGNED_STACK=0")
endif()
-if (HIGH_BIT_DEPTH)
+if(HIGH_BIT_DEPTH)
set(ASM_FLAGS "${ASM_FLAGS} -DHIGH_BIT_DEPTH=1 -DBIT_DEPTH=10")
else()
set(ASM_FLAGS "${ASM_FLAGS} -DHIGH_BIT_DEPTH=0 -DBIT_DEPTH=8")
diff -r 0b696c7f46f2 -r 024ca523052f source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/common/ipfilter.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -381,20 +381,12 @@ namespace x265 {
p.chroma[X265_CSP_I420].filter_vss[CHROMA_ ## W ## x ## H] = interp_vert_ss_c<4, W, H>;
#define CHROMA_422(W, H) \
- p.chroma[X265_CSP_I422].filter_hpp[CHROMA_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_hps[CHROMA_ ## W ## x ## H] = interp_horiz_ps_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vpp[CHROMA_ ## W ## x ## H] = interp_vert_pp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vps[CHROMA_ ## W ## x ## H] = interp_vert_ps_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vsp[CHROMA_ ## W ## x ## H] = interp_vert_sp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vss[CHROMA_ ## W ## x ## H] = interp_vert_ss_c<4, W, H * 2>;
-
-#define CHROMA_422_X(W, H) \
- p.chroma[X265_CSP_I422].filter_hpp[0] = interp_horiz_pp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_hps[0] = interp_horiz_ps_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vpp[0] = interp_vert_pp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vps[0] = interp_vert_ps_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vsp[0] = interp_vert_sp_c<4, W, H * 2>; \
- p.chroma[X265_CSP_I422].filter_vss[0] = interp_vert_ss_c<4, W, H * 2>;
+ p.chroma[X265_CSP_I422].filter_hpp[CHROMA422_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].filter_hps[CHROMA422_ ## W ## x ## H] = interp_horiz_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].filter_vpp[CHROMA422_ ## W ## x ## H] = interp_vert_pp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].filter_vps[CHROMA422_ ## W ## x ## H] = interp_vert_ps_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].filter_vsp[CHROMA422_ ## W ## x ## H] = interp_vert_sp_c<4, W, H>; \
+ p.chroma[X265_CSP_I422].filter_vss[CHROMA422_ ## W ## x ## H] = interp_vert_ss_c<4, W, H>;
#define CHROMA_444(W, H) \
p.chroma[X265_CSP_I444].filter_hpp[LUMA_ ## W ## x ## H] = interp_horiz_pp_c<4, W, H>; \
@@ -465,31 +457,30 @@ void Setup_C_IPFilterPrimitives(EncoderP
LUMA(16, 64);
CHROMA_420(8, 32);
- CHROMA_422_X(4, 4);
+ CHROMA_422(4, 8);
CHROMA_422(4, 4);
- CHROMA_422(2, 4);
- CHROMA_422(4, 2);
+ CHROMA_422(2, 8);
+ CHROMA_422(8, 16);
CHROMA_422(8, 8);
+ CHROMA_422(4, 16);
+ CHROMA_422(8, 12);
+ CHROMA_422(6, 16);
CHROMA_422(8, 4);
- CHROMA_422(4, 8);
- CHROMA_422(8, 6);
- CHROMA_422(6, 8);
- CHROMA_422(8, 2);
- CHROMA_422(2, 8);
+ CHROMA_422(2, 16);
+ CHROMA_422(16, 32);
CHROMA_422(16, 16);
+ CHROMA_422(8, 32);
+ CHROMA_422(16, 24);
+ CHROMA_422(12, 32);
CHROMA_422(16, 8);
- CHROMA_422(8, 16);
- CHROMA_422(16, 12);
- CHROMA_422(12, 16);
- CHROMA_422(16, 4);
- CHROMA_422(4, 16);
+ CHROMA_422(4, 32);
+ CHROMA_422(32, 64);
CHROMA_422(32, 32);
+ CHROMA_422(16, 64);
+ CHROMA_422(32, 48);
+ CHROMA_422(24, 64);
CHROMA_422(32, 16);
- CHROMA_422(16, 32);
- CHROMA_422(32, 24);
- CHROMA_422(24, 32);
- CHROMA_422(32, 8);
- CHROMA_422(8, 32);
+ CHROMA_422(8, 64);
CHROMA_444(4, 4);
CHROMA_444(8, 8);
diff -r 0b696c7f46f2 -r 024ca523052f source/common/pixel.cpp
--- a/source/common/pixel.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/common/pixel.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -938,20 +938,22 @@ void Setup_C_PixelPrimitives(EncoderPrim
p.chroma[X265_CSP_I420].add_ps[CHROMA_ ## W ## x ## H] = pixel_add_ps_c<W, H>;
#define CHROMA_422(W, H) \
- p.chroma[X265_CSP_I422].addAvg[CHROMA_ ## W ## x ## H] = addAvg<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_pp[CHROMA_ ## W ## x ## H] = blockcopy_pp_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_sp[CHROMA_ ## W ## x ## H] = blockcopy_sp_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_ps[CHROMA_ ## W ## x ## H] = blockcopy_ps_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_ss[CHROMA_ ## W ## x ## H] = blockcopy_ss_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].sub_ps[CHROMA_ ## W ## x ## H] = pixel_sub_ps_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].add_ps[CHROMA_ ## W ## x ## H] = pixel_add_ps_c<W, H * 2>;
+ p.chroma[X265_CSP_I422].addAvg [CHROMA422_ ## W ## x ## H] = addAvg<W, H>; \
+ p.chroma[X265_CSP_I422].copy_pp[CHROMA422_ ## W ## x ## H] = blockcopy_pp_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_sp[CHROMA422_ ## W ## x ## H] = blockcopy_sp_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_ps[CHROMA422_ ## W ## x ## H] = blockcopy_ps_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_ss[CHROMA422_ ## W ## x ## H] = blockcopy_ss_c<W, H>; \
+ p.chroma[X265_CSP_I422].sub_ps [CHROMA422_ ## W ## x ## H] = pixel_sub_ps_c<W, H>; \
+ p.chroma[X265_CSP_I422].add_ps [CHROMA422_ ## W ## x ## H] = pixel_add_ps_c<W, H>;
#define CHROMA_422_X(W, H) \
- p.chroma[X265_CSP_I422].addAvg[0] = addAvg<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_pp[0] = blockcopy_pp_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_sp[0] = blockcopy_sp_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_ps[0] = blockcopy_ps_c<W, H * 2>; \
- p.chroma[X265_CSP_I422].copy_ss[0] = blockcopy_ss_c<W, H * 2>;
+ p.chroma[X265_CSP_I422].addAvg [CHROMA422X_ ## W ## x ## H] = addAvg<W, H>; \
+ p.chroma[X265_CSP_I422].copy_pp[CHROMA422X_ ## W ## x ## H] = blockcopy_pp_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_sp[CHROMA422X_ ## W ## x ## H] = blockcopy_sp_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_ps[CHROMA422X_ ## W ## x ## H] = blockcopy_ps_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_ss[CHROMA422X_ ## W ## x ## H] = blockcopy_ss_c<W, H>; \
+ p.chroma[X265_CSP_I422].copy_sp[NUM_CHROMA_PARTITIONS422] = blockcopy_sp_c<W, (H >> 1)>; \
+ p.chroma[X265_CSP_I422].copy_ps[NUM_CHROMA_PARTITIONS422] = blockcopy_ps_c<W, (H >> 1)>;
#define CHROMA_444(W, H) \
p.chroma[X265_CSP_I444].addAvg[LUMA_ ## W ## x ## H] = addAvg<W, H>; \
@@ -1021,31 +1023,31 @@ void Setup_C_PixelPrimitives(EncoderPrim
LUMA(16, 64);
CHROMA_420(8, 32);
- CHROMA_422_X(4, 4);
+ CHROMA_422_X(4, 8);
+ CHROMA_422(4, 8);
CHROMA_422(4, 4);
- CHROMA_422(4, 2);
- CHROMA_422(2, 4);
+ CHROMA_422(2, 8);
+ CHROMA_422(8, 16);
CHROMA_422(8, 8);
+ CHROMA_422(4, 16);
+ CHROMA_422(8, 12);
+ CHROMA_422(6, 16);
CHROMA_422(8, 4);
- CHROMA_422(4, 8);
- CHROMA_422(8, 6);
- CHROMA_422(6, 8);
- CHROMA_422(8, 2);
- CHROMA_422(2, 8);
+ CHROMA_422(2, 16);
+ CHROMA_422(16, 32);
CHROMA_422(16, 16);
+ CHROMA_422(8, 32);
+ CHROMA_422(16, 24);
+ CHROMA_422(12, 32);
CHROMA_422(16, 8);
- CHROMA_422(8, 16);
- CHROMA_422(16, 12);
- CHROMA_422(12, 16);
- CHROMA_422(16, 4);
- CHROMA_422(4, 16);
+ CHROMA_422(4, 32);
+ CHROMA_422(32, 64);
CHROMA_422(32, 32);
+ CHROMA_422(16, 64);
+ CHROMA_422(32, 48);
+ CHROMA_422(24, 64);
CHROMA_422(32, 16);
- CHROMA_422(16, 32);
- CHROMA_422(32, 24);
- CHROMA_422(24, 32);
- CHROMA_422(32, 8);
- CHROMA_422(8, 32);
+ CHROMA_422(8, 64);
CHROMA_444(4, 4);
CHROMA_444(8, 8);
diff -r 0b696c7f46f2 -r 024ca523052f source/common/primitives.h
--- a/source/common/primitives.h Tue Apr 15 14:07:33 2014 -0500
+++ b/source/common/primitives.h Wed Apr 16 10:49:40 2014 +0800
@@ -58,6 +58,17 @@ enum Chroma420Partitions
NUM_CHROMA_PARTITIONS
};
+enum Chroma422Partitions
+{
+ CHROMA422X_4x8,
+ CHROMA422_4x8, CHROMA422_4x4, CHROMA422_2x8,
+ CHROMA422_8x16, CHROMA422_8x8, CHROMA422_4x16, CHROMA422_8x12, CHROMA422_6x16, CHROMA422_8x4, CHROMA422_2x16,
+ CHROMA422_16x32, CHROMA422_16x16, CHROMA422_8x32, CHROMA422_16x24, CHROMA422_12x32, CHROMA422_16x8, CHROMA422_4x32,
+ CHROMA422_32x64, CHROMA422_32x32, CHROMA422_16x64, CHROMA422_32x48, CHROMA422_24x64, CHROMA422_32x16, CHROMA422_8x64,
+ NUM_CHROMA_PARTITIONS422
+};
+
+
enum SquareBlocks // Routines can be indexed using log2n(width)-2
{
BLOCK_4x4,
@@ -245,8 +256,8 @@ struct EncoderPrimitives
filter_pp_t filter_hpp[NUM_LUMA_PARTITIONS];
filter_hps_t filter_hps[NUM_LUMA_PARTITIONS];
copy_pp_t copy_pp[NUM_LUMA_PARTITIONS];
- copy_sp_t copy_sp[NUM_LUMA_PARTITIONS];
- copy_ps_t copy_ps[NUM_LUMA_PARTITIONS];
+ copy_sp_t copy_sp[NUM_LUMA_PARTITIONS + 1];
+ copy_ps_t copy_ps[NUM_LUMA_PARTITIONS + 1];
copy_ss_t copy_ss[NUM_LUMA_PARTITIONS];
pixel_sub_ps_t sub_ps[NUM_LUMA_PARTITIONS];
pixel_add_ps_t add_ps[NUM_LUMA_PARTITIONS];
diff -r 0b696c7f46f2 -r 024ca523052f source/common/shortyuv.cpp
--- a/source/common/shortyuv.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/common/shortyuv.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -211,7 +211,7 @@ void ShortYuv::copyPartToPartShortChroma
void ShortYuv::copyPartToPartYuvChroma(TComYuv* dstPicYuv, uint32_t partIdx, uint32_t lumaSize, uint32_t chromaId, const bool splitIntoSubTUs)
{
- int part = splitIntoSubTUs ? NUM_CHROMA_PARTITIONS : partitionFromSizes(lumaSize, lumaSize);
+ int part = splitIntoSubTUs ? NUM_CHROMA_PARTITIONS422 : partitionFromSizes(lumaSize, lumaSize);
if (chromaId == 1)
{
diff -r 0b696c7f46f2 -r 024ca523052f source/common/x86/dct8.asm
--- a/source/common/x86/dct8.asm Tue Apr 15 14:07:33 2014 -0500
+++ b/source/common/x86/dct8.asm Wed Apr 16 10:49:40 2014 +0800
@@ -834,8 +834,14 @@ cglobal patial_butterfly_inverse_interna
ret
-cglobal idct8, 3,7,8,0-16*mmsize
+cglobal idct8, 3,7,8 ;,0-16*mmsize
+ ; alignment stack to 64-bytes
mov r5, rsp
+ sub rsp, 16*mmsize + gprsize
+ and rsp, ~(64-1)
+ mov [rsp + 16*mmsize], r5
+ mov r5, rsp
+
lea r4, [tab_idct8_3]
lea r6, [tab_dct4]
@@ -866,4 +872,7 @@ cglobal idct8, 3,7,8,0-16*mmsize
call patial_butterfly_inverse_internal_pass2
+ ; restore origin stack pointer
+ mov rsp, [rsp + 16*mmsize]
+
RET
diff -r 0b696c7f46f2 -r 024ca523052f source/encoder/frameencoder.cpp
--- a/source/encoder/frameencoder.cpp Tue Apr 15 14:07:33 2014 -0500
+++ b/source/encoder/frameencoder.cpp Wed Apr 16 10:49:40 2014 +0800
@@ -192,7 +192,7 @@ int FrameEncoder::getStreamHeaders(NALUn
/* headers for start of bitstream */
OutputNALUnit nalu(NAL_UNIT_VPS);
entropyCoder->setBitstream(&nalu.m_bitstream);
- entropyCoder->encodeVPS(&m_cfg->m_vps);
+ entropyCoder->encodeVPS(&m_top->m_vps);
writeRBSPTrailingBits(nalu.m_bitstream);
CHECKED_MALLOC(nalunits[count], NALUnitEBSP, 1);
nalunits[count]->init(nalu);
@@ -217,7 +217,7 @@ int FrameEncoder::getStreamHeaders(NALUn
if (m_cfg->m_activeParameterSetsSEIEnabled)
{
SEIActiveParameterSets sei;
- sei.activeVPSId = m_cfg->m_vps.getVPSId();
+ sei.activeVPSId = m_top->m_vps.getVPSId();
sei.m_fullRandomAccessFlag = false;
More information about the x265-commits
mailing list