[x265-commits] [x265] refactor: Check need for signed/unsigned int16_t
Murugan Vairavel
murugan at multicorewareinc.com
Wed Oct 30 08:21:12 CET 2013
details: http://hg.videolan.org/x265/rev/c946d617fd9f
branches:
changeset: 4738:c946d617fd9f
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Tue Oct 29 15:16:28 2013 +0530
description:
refactor: Check need for signed/unsigned int16_t
Subject: [x265] pixel: remove sad_16, sad_x3_16 and sad_x4_16
details: http://hg.videolan.org/x265/rev/42ae4dc90005
branches:
changeset: 4739:42ae4dc90005
user: Steve Borho <steve at borho.org>
date: Tue Oct 29 15:43:18 2013 -0500
description:
pixel: remove sad_16, sad_x3_16 and sad_x4_16
We have assembly coverage for everything but sad_16x12; which I've put on the
top of our TODO list.
Subject: [x265] asm: created comman asm macro for pixel_sad_32xN functions
details: http://hg.videolan.org/x265/rev/f69c0f13c7b0
branches:
changeset: 4740:f69c0f13c7b0
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:15:40 2013 +0530
description:
asm: created comman asm macro for pixel_sad_32xN functions
Subject: [x265] asm: assembly code for pixel_sad_32x8
details: http://hg.videolan.org/x265/rev/1aec8ddad7a3
branches:
changeset: 4741:1aec8ddad7a3
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:19:19 2013 +0530
description:
asm: assembly code for pixel_sad_32x8
Subject: [x265] asm: assembly code for pixel_sad_32x24
details: http://hg.videolan.org/x265/rev/840a638609b0
branches:
changeset: 4742:840a638609b0
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:21:28 2013 +0530
description:
asm: assembly code for pixel_sad_32x24
Subject: [x265] asm: assembly code for pixel_sad_32x32
details: http://hg.videolan.org/x265/rev/77aa24f08e76
branches:
changeset: 4743:77aa24f08e76
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:23:15 2013 +0530
description:
asm: assembly code for pixel_sad_32x32
Subject: [x265] asm: assembly code for pixel_sad_32x16
details: http://hg.videolan.org/x265/rev/def3d61bc4b0
branches:
changeset: 4744:def3d61bc4b0
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:25:54 2013 +0530
description:
asm: assembly code for pixel_sad_32x16
Subject: [x265] asm: assembly code for pixel_sad_32x64
details: http://hg.videolan.org/x265/rev/d3e510bb67cf
branches:
changeset: 4745:d3e510bb67cf
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 18:28:50 2013 +0530
description:
asm: assembly code for pixel_sad_32x64
Subject: [x265] asm: assembly code for pixel_sad_8x32
details: http://hg.videolan.org/x265/rev/c048ef93ea55
branches:
changeset: 4746:c048ef93ea55
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Oct 29 20:04:06 2013 +0530
description:
asm: assembly code for pixel_sad_8x32
Subject: [x265] testbench: upgrade for check_IPFilter_primitive, don't pass wrong (width, height, stride) to asm
details: http://hg.videolan.org/x265/rev/20aa88626c52
branches:
changeset: 4747:20aa88626c52
user: Min Chen <chenm003 at 163.com>
date: Wed Oct 30 11:58:41 2013 +0800
description:
testbench: upgrade for check_IPFilter_primitive, don't pass wrong (width, height, stride) to asm
Subject: [x265] pixel: remove sad_8, sad_x3_8, sad_x4_8 intrinsic functions
details: http://hg.videolan.org/x265/rev/abf8286f3fa9
branches:
changeset: 4748:abf8286f3fa9
user: Steve Borho <steve at borho.org>
date: Wed Oct 30 00:31:46 2013 -0500
description:
pixel: remove sad_8, sad_x3_8, sad_x4_8 intrinsic functions
Subject: [x265] asm: assembly code for pixel_sad_16x12
details: http://hg.videolan.org/x265/rev/40e38dfa5cdd
branches:
changeset: 4749:40e38dfa5cdd
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Wed Oct 30 11:06:13 2013 +0530
description:
asm: assembly code for pixel_sad_16x12
Subject: [x265] writing hash SEI messages in frameencoder
details: http://hg.videolan.org/x265/rev/4c047e5ff69b
branches:
changeset: 4750:4c047e5ff69b
user: Santhoshini Sekar <santhoshini at multicorewareinc.com>
date: Wed Oct 30 08:50:49 2013 +0530
description:
writing hash SEI messages in frameencoder
Subject: [x265] rename variable name m_Bitstream to m_bitstream
details: http://hg.videolan.org/x265/rev/e2a1dcca4518
branches:
changeset: 4751:e2a1dcca4518
user: Santhoshini Sekar <santhoshini at multicorewareinc.com>
date: Wed Oct 30 09:17:09 2013 +0530
description:
rename variable name m_Bitstream to m_bitstream
Subject: [x265] assembly code for pixel_sad_x3_4x16 and pixel_sad_x4_4x16
details: http://hg.videolan.org/x265/rev/50c2c41ac0ea
branches:
changeset: 4752:50c2c41ac0ea
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Wed Oct 30 11:33:49 2013 +0530
description:
assembly code for pixel_sad_x3_4x16 and pixel_sad_x4_4x16
Subject: [x265] pixel: remove sad_x3_4x16 and sad_x4_4x16, no longer HAVE_MMX
details: http://hg.videolan.org/x265/rev/65462024832b
branches:
changeset: 4753:65462024832b
user: Steve Borho <steve at borho.org>
date: Wed Oct 30 01:54:16 2013 -0500
description:
pixel: remove sad_x3_4x16 and sad_x4_4x16, no longer HAVE_MMX
diffstat:
source/Lib/TLibEncoder/NALwrite.cpp | 4 +-
source/Lib/TLibEncoder/NALwrite.h | 6 +-
source/common/ipfilter.cpp | 12 +-
source/common/vec/pixel-sse41.cpp | 1794 +---------------------------------
source/common/x86/asm-primitives.cpp | 12 +-
source/common/x86/sad-a.asm | 373 +++++++
source/encoder/encoder.cpp | 40 +-
source/encoder/frameencoder.cpp | 85 +-
source/encoder/frameencoder.h | 1 +
source/encoder/motion.cpp | 2 +-
source/test/intrapredharness.cpp | 2 +-
source/test/ipfilterharness.cpp | 21 +-
12 files changed, 482 insertions(+), 1870 deletions(-)
diffs (truncated from 2722 to 300 lines):
diff -r b02df3ebdf39 -r 65462024832b source/Lib/TLibEncoder/NALwrite.cpp
--- a/source/Lib/TLibEncoder/NALwrite.cpp Tue Oct 29 13:36:37 2013 -0500
+++ b/source/Lib/TLibEncoder/NALwrite.cpp Wed Oct 30 01:54:16 2013 -0500
@@ -82,8 +82,8 @@ void write(uint8_t*& out, OutputNALUnit&
* - 0x00000302
* - 0x00000303
*/
- uint32_t fsize = nalu.m_Bitstream.getByteStreamLength();
- uint8_t* fifo = nalu.m_Bitstream.getFIFO();
+ uint32_t fsize = nalu.m_bitstream.getByteStreamLength();
+ uint8_t* fifo = nalu.m_bitstream.getFIFO();
uint8_t* emulation = (uint8_t*)X265_MALLOC(uint8_t, fsize + EMULATION_SIZE);
uint32_t nalsize = 0;
diff -r b02df3ebdf39 -r 65462024832b source/Lib/TLibEncoder/NALwrite.h
--- a/source/Lib/TLibEncoder/NALwrite.h Tue Oct 29 13:36:37 2013 -0500
+++ b/source/Lib/TLibEncoder/NALwrite.h Wed Oct 30 01:54:16 2013 -0500
@@ -61,17 +61,17 @@ struct OutputNALUnit : public NALUnit
uint32_t temporalID = 0,
uint32_t reserved_zero_6bits = 0)
: NALUnit(nalUnitType, temporalID, reserved_zero_6bits)
- , m_Bitstream()
+ , m_bitstream()
{}
OutputNALUnit& operator =(const NALUnit& src)
{
- m_Bitstream.clear();
+ m_bitstream.clear();
static_cast<NALUnit*>(this)->operator =(src);
return *this;
}
- TComOutputBitstream m_Bitstream;
+ TComOutputBitstream m_bitstream;
};
void write(uint8_t*& out, OutputNALUnit& nalu, uint32_t& packetSize);
diff -r b02df3ebdf39 -r 65462024832b source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp Tue Oct 29 13:36:37 2013 -0500
+++ b/source/common/ipfilter.cpp Wed Oct 30 01:54:16 2013 -0500
@@ -42,7 +42,7 @@ void filterVertical_sp_c(int16_t *src, i
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
int shift = IF_FILTER_PREC + headRoom;
int offset = (1 << (shift - 1)) + (IF_INTERNAL_OFFS << IF_FILTER_PREC);
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
const int16_t *coeff = (N == 8 ? g_lumaFilter[coeffIdx] : g_chromaFilter[coeffIdx]);
src -= (N / 2 - 1) * srcStride;
@@ -84,7 +84,7 @@ void filterHorizontal_pp_c(pixel *src, i
{
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
int offset = (1 << (headRoom - 1));
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
const int cStride = 1;
src -= (N / 2 - 1) * cStride;
@@ -228,7 +228,7 @@ void filterConvertShortToPel_c(int16_t *
{
int shift = IF_INTERNAL_PREC - X265_DEPTH;
int16_t offset = IF_INTERNAL_OFFS + (shift ? (1 << (shift - 1)) : 0);
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
int row, col;
for (row = 0; row < height; row++)
{
@@ -269,7 +269,7 @@ void filterVertical_pp_c(pixel *src, int
{
int shift = IF_FILTER_PREC;
int offset = 1 << (shift - 1);
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
src -= (N / 2 - 1) * srcStride;
int row, col;
@@ -330,7 +330,7 @@ void interp_horiz_pp_c(pixel *src, intpt
int16_t const * coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
int offset = (1 << (headRoom - 1));
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
int cStride = 1;
src -= (N / 2 - 1) * cStride;
@@ -370,7 +370,7 @@ void interp_vert_pp_c(pixel *src, intptr
int16_t const * c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
int shift = IF_FILTER_PREC;
int offset = 1 << (shift - 1);
- int16_t maxVal = (1 << X265_DEPTH) - 1;
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
src -= (N / 2 - 1) * srcStride;
int row, col;
diff -r b02df3ebdf39 -r 65462024832b source/common/vec/pixel-sse41.cpp
--- a/source/common/vec/pixel-sse41.cpp Tue Oct 29 13:36:37 2013 -0500
+++ b/source/common/vec/pixel-sse41.cpp Wed Oct 30 01:54:16 2013 -0500
@@ -31,180 +31,8 @@
using namespace x265;
-#if defined(_MSC_VER)
-#pragma warning(disable: 4799) // MMX warning EMMS
-#endif
-
-#if defined(__INTEL_COMPILER) || defined(__GCC__)
-#define HAVE_MMX 1
-#elif defined(_MSC_VER) && defined(X86_64)
-#define HAVE_MMX 0
-#else
-#define HAVE_MMX 1
-#endif
-
namespace {
#if !HIGH_BIT_DEPTH
-#if HAVE_MMX
-template<int ly>
-// ly will always be 32
-int sad_8(pixel * fenc, intptr_t fencstride, pixel * fref, intptr_t frefstride)
-{
- __m64 sum0 = _mm_setzero_si64();
-
- __m64 T00, T01, T02, T03;
- __m64 T10, T11, T12, T13;
- __m64 T20, T21, T22, T23;
-
- for (int i = 0; i < ly; i += 16)
- {
- T00 = (*(__m64*)(fenc + (i + 0) * fencstride));
- T01 = (*(__m64*)(fenc + (i + 1) * fencstride));
- T02 = (*(__m64*)(fenc + (i + 2) * fencstride));
- T03 = (*(__m64*)(fenc + (i + 3) * fencstride));
-
- T10 = (*(__m64*)(fref + (i + 0) * frefstride));
- T11 = (*(__m64*)(fref + (i + 1) * frefstride));
- T12 = (*(__m64*)(fref + (i + 2) * frefstride));
- T13 = (*(__m64*)(fref + (i + 3) * frefstride));
-
- T20 = _mm_sad_pu8(T00, T10);
- T21 = _mm_sad_pu8(T01, T11);
- T22 = _mm_sad_pu8(T02, T12);
- T23 = _mm_sad_pu8(T03, T13);
-
- sum0 = _mm_add_pi16(sum0, T20);
- sum0 = _mm_add_pi16(sum0, T21);
- sum0 = _mm_add_pi16(sum0, T22);
- sum0 = _mm_add_pi16(sum0, T23);
-
- T00 = (*(__m64*)(fenc + (i + 4) * fencstride));
- T01 = (*(__m64*)(fenc + (i + 5) * fencstride));
- T02 = (*(__m64*)(fenc + (i + 6) * fencstride));
- T03 = (*(__m64*)(fenc + (i + 7) * fencstride));
-
- T10 = (*(__m64*)(fref + (i + 4) * frefstride));
- T11 = (*(__m64*)(fref + (i + 5) * frefstride));
- T12 = (*(__m64*)(fref + (i + 6) * frefstride));
- T13 = (*(__m64*)(fref + (i + 7) * frefstride));
-
- T20 = _mm_sad_pu8(T00, T10);
- T21 = _mm_sad_pu8(T01, T11);
- T22 = _mm_sad_pu8(T02, T12);
- T23 = _mm_sad_pu8(T03, T13);
-
- sum0 = _mm_add_pi16(sum0, T20);
- sum0 = _mm_add_pi16(sum0, T21);
- sum0 = _mm_add_pi16(sum0, T22);
- sum0 = _mm_add_pi16(sum0, T23);
-
- T00 = (*(__m64*)(fenc + (i + 8) * fencstride));
- T01 = (*(__m64*)(fenc + (i + 9) * fencstride));
- T02 = (*(__m64*)(fenc + (i + 10) * fencstride));
- T03 = (*(__m64*)(fenc + (i + 11) * fencstride));
-
- T10 = (*(__m64*)(fref + (i + 8) * frefstride));
- T11 = (*(__m64*)(fref + (i + 9) * frefstride));
- T12 = (*(__m64*)(fref + (i + 10) * frefstride));
- T13 = (*(__m64*)(fref + (i + 11) * frefstride));
-
- T20 = _mm_sad_pu8(T00, T10);
- T21 = _mm_sad_pu8(T01, T11);
- T22 = _mm_sad_pu8(T02, T12);
- T23 = _mm_sad_pu8(T03, T13);
-
- sum0 = _mm_add_pi16(sum0, T20);
- sum0 = _mm_add_pi16(sum0, T21);
- sum0 = _mm_add_pi16(sum0, T22);
- sum0 = _mm_add_pi16(sum0, T23);
-
- T00 = (*(__m64*)(fenc + (i + 12) * fencstride));
- T01 = (*(__m64*)(fenc + (i + 13) * fencstride));
- T02 = (*(__m64*)(fenc + (i + 14) * fencstride));
- T03 = (*(__m64*)(fenc + (i + 15) * fencstride));
-
- T10 = (*(__m64*)(fref + (i + 12) * frefstride));
- T11 = (*(__m64*)(fref + (i + 13) * frefstride));
- T12 = (*(__m64*)(fref + (i + 14) * frefstride));
- T13 = (*(__m64*)(fref + (i + 15) * frefstride));
-
- T20 = _mm_sad_pu8(T00, T10);
- T21 = _mm_sad_pu8(T01, T11);
- T22 = _mm_sad_pu8(T02, T12);
- T23 = _mm_sad_pu8(T03, T13);
-
- sum0 = _mm_add_pi16(sum0, T20);
- sum0 = _mm_add_pi16(sum0, T21);
- sum0 = _mm_add_pi16(sum0, T22);
- sum0 = _mm_add_pi16(sum0, T23);
- }
-
- // 8 * 255 -> 11 bits x 8 -> 14 bits
- return _m_to_int(sum0);
-}
-
-#else /* if HAVE_MMX */
-
-template<int ly>
-// ly will always be 32
-int sad_8(pixel * fenc, intptr_t fencstride, pixel * fref, intptr_t frefstride)
-{
- __m128i sum0 = _mm_setzero_si128();
- __m128i sum1 = _mm_setzero_si128();
- __m128i T00, T01, T02, T03;
- __m128i T10, T11, T12, T13;
- __m128i T20, T21;
-
- for (int i = 0; i < ly; i += 8)
- {
- T00 = _mm_loadl_epi64((__m128i*)(fenc + (i + 0) * fencstride));
- T01 = _mm_loadl_epi64((__m128i*)(fenc + (i + 1) * fencstride));
- T01 = _mm_unpacklo_epi64(T00, T01);
- T02 = _mm_loadl_epi64((__m128i*)(fenc + (i + 2) * fencstride));
- T03 = _mm_loadl_epi64((__m128i*)(fenc + (i + 3) * fencstride));
- T03 = _mm_unpacklo_epi64(T02, T03);
-
- T10 = _mm_loadl_epi64((__m128i*)(fref + (i + 0) * frefstride));
- T11 = _mm_loadl_epi64((__m128i*)(fref + (i + 1) * frefstride));
- T11 = _mm_unpacklo_epi64(T10, T11);
- T12 = _mm_loadl_epi64((__m128i*)(fref + (i + 2) * frefstride));
- T13 = _mm_loadl_epi64((__m128i*)(fref + (i + 3) * frefstride));
- T13 = _mm_unpacklo_epi64(T12, T13);
- T20 = _mm_sad_epu8(T01, T11);
- T21 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi32(sum0, T20);
- sum1 = _mm_add_epi32(sum1, T21);
-
- T00 = _mm_loadl_epi64((__m128i*)(fenc + (i + 4) * fencstride));
- T01 = _mm_loadl_epi64((__m128i*)(fenc + (i + 5) * fencstride));
- T01 = _mm_unpacklo_epi64(T00, T01);
- T02 = _mm_loadl_epi64((__m128i*)(fenc + (i + 6) * fencstride));
- T03 = _mm_loadl_epi64((__m128i*)(fenc + (i + 7) * fencstride));
- T03 = _mm_unpacklo_epi64(T02, T03);
-
- T10 = _mm_loadl_epi64((__m128i*)(fref + (i + 4) * frefstride));
- T11 = _mm_loadl_epi64((__m128i*)(fref + (i + 5) * frefstride));
- T11 = _mm_unpacklo_epi64(T10, T11);
- T12 = _mm_loadl_epi64((__m128i*)(fref + (i + 6) * frefstride));
- T13 = _mm_loadl_epi64((__m128i*)(fref + (i + 7) * frefstride));
- T13 = _mm_unpacklo_epi64(T12, T13);
- T20 = _mm_sad_epu8(T01, T11);
- T21 = _mm_sad_epu8(T03, T13);
-
- sum0 = _mm_add_epi32(sum0, T20);
- sum1 = _mm_add_epi32(sum1, T21);
- }
-
- // [0 x 0 x]
- sum0 = _mm_add_epi32(sum0, sum1);
- sum1 = _mm_shuffle_epi32(sum0, 2);
- sum0 = _mm_add_epi32(sum0, sum1);
- return _mm_cvtsi128_si32(sum0);
-}
-
-#endif /* if HAVE_MMX */
-
template<int ly>
// will only be instanced with ly == 16
int sad_12(pixel *fenc, intptr_t fencstride, pixel *fref, intptr_t frefstride)
@@ -256,61 +84,6 @@ int sad_12(pixel *fenc, intptr_t fencstr
}
template<int ly>
-int sad_16(pixel * fenc, intptr_t fencstride, pixel * fref, intptr_t frefstride)
-{
- __m128i sum0 = _mm_setzero_si128();
- __m128i sum1 = _mm_setzero_si128();
- __m128i T00, T01, T02, T03;
- __m128i T10, T11, T12, T13;
- __m128i T20, T21, T22, T23;
-
-#define PROCESS_16x4(BASE) \
- T00 = _mm_load_si128((__m128i*)(fenc + (BASE + 0) * fencstride)); \
- T01 = _mm_load_si128((__m128i*)(fenc + (BASE + 1) * fencstride)); \
- T02 = _mm_load_si128((__m128i*)(fenc + (BASE + 2) * fencstride)); \
- T03 = _mm_load_si128((__m128i*)(fenc + (BASE + 3) * fencstride)); \
- T10 = _mm_loadu_si128((__m128i*)(fref + (BASE + 0) * frefstride)); \
- T11 = _mm_loadu_si128((__m128i*)(fref + (BASE + 1) * frefstride)); \
- T12 = _mm_loadu_si128((__m128i*)(fref + (BASE + 2) * frefstride)); \
- T13 = _mm_loadu_si128((__m128i*)(fref + (BASE + 3) * frefstride)); \
More information about the x265-commits
mailing list