[x265-commits] [x265] pixel: fix 16bpp warnings that were previously hidden by ...
Steve Borho
steve at borho.org
Mon Dec 2 23:53:19 CET 2013
details: http://hg.videolan.org/x265/rev/0a85121531fc
branches:
changeset: 5415:0a85121531fc
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 01:04:33 2013 -0600
description:
pixel: fix 16bpp warnings that were previously hidden by cmake rules
Subject: [x265] asm: removed unused code from pixel-a.asm
details: http://hg.videolan.org/x265/rev/df0b4f81609e
branches:
changeset: 5416:df0b4f81609e
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Mon Dec 02 12:19:34 2013 +0530
description:
asm: removed unused code from pixel-a.asm
Subject: [x265] slicetype: fix for gcc warnings
details: http://hg.videolan.org/x265/rev/0a8023666206
branches:
changeset: 5417:0a8023666206
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Mon Dec 02 12:53:59 2013 +0530
description:
slicetype: fix for gcc warnings
Subject: [x265] fix for the number of weighted references exceeding 8 in HM weight analysis
details: http://hg.videolan.org/x265/rev/bf778de26451
branches: stable
changeset: 5418:bf778de26451
user: Shazeb Nawaz Khan <shazeb at multicorewareinc.com>
date: Mon Dec 02 12:51:57 2013 +0530
description:
fix for the number of weighted references exceeding 8 in HM weight analysis
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/d8d716eb11b8
branches:
changeset: 5419:d8d716eb11b8
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 01:39:44 2013 -0600
description:
Merge with stable
Subject: [x265] cmake: fix Win64 vector primitive compile flags
details: http://hg.videolan.org/x265/rev/ccf65888fc2c
branches:
changeset: 5420:ccf65888fc2c
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 12:34:12 2013 -0600
description:
cmake: fix Win64 vector primitive compile flags
Subject: [x265] picel: fix compile error from older gcc
details: http://hg.videolan.org/x265/rev/4508b8c923e6
branches:
changeset: 5421:4508b8c923e6
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:18:37 2013 -0600
description:
picel: fix compile error from older gcc
Subject: [x265] cleanup: removed unused code from sad-a.asm
details: http://hg.videolan.org/x265/rev/a615a46d4631
branches:
changeset: 5422:a615a46d4631
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Mon Dec 02 15:29:22 2013 +0530
description:
cleanup: removed unused code from sad-a.asm
Subject: [x265] asm: removed unused function defnitions from pixel.h
details: http://hg.videolan.org/x265/rev/47ddbf9b5866
branches:
changeset: 5423:47ddbf9b5866
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Mon Dec 02 13:06:09 2013 +0530
description:
asm: removed unused function defnitions from pixel.h
Subject: [x265] rc: fixups for cutree changes
details: http://hg.videolan.org/x265/rev/dab34fa63c0c
branches:
changeset: 5424:dab34fa63c0c
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 13:06:59 2013 -0600
description:
rc: fixups for cutree changes
Subject: [x265] asm: move cvt* functions to blockcopy8.asm
details: http://hg.videolan.org/x265/rev/b6766dc86e2a
branches:
changeset: 5425:b6766dc86e2a
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:37:53 2013 -0600
description:
asm: move cvt* functions to blockcopy8.asm
Subject: [x265] asm: remove more unused funcdefs from pixel.h
details: http://hg.videolan.org/x265/rev/41c6dc5b35e8
branches:
changeset: 5426:41c6dc5b35e8
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:38:12 2013 -0600
description:
asm: remove more unused funcdefs from pixel.h
Subject: [x265] asm: move transpose from pixel-a.asm to pixel-util8.asm, add pixel-util.h
details: http://hg.videolan.org/x265/rev/a182faf23ead
branches:
changeset: 5427:a182faf23ead
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:47:13 2013 -0600
description:
asm: move transpose from pixel-a.asm to pixel-util8.asm, add pixel-util.h
Subject: [x265] asm: move SSIM functions to pixel-util
details: http://hg.videolan.org/x265/rev/b091438d1446
branches:
changeset: 5428:b091438d1446
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:51:29 2013 -0600
description:
asm: move SSIM functions to pixel-util
Subject: [x265] asm: move scale functions to pixel-util
details: http://hg.videolan.org/x265/rev/a439c19ee304
branches:
changeset: 5429:a439c19ee304
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:54:13 2013 -0600
description:
asm: move scale functions to pixel-util
Subject: [x265] pixel: remove an unused macro
details: http://hg.videolan.org/x265/rev/2ed3b664c370
branches:
changeset: 5430:2ed3b664c370
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 14:59:22 2013 -0600
description:
pixel: remove an unused macro
Subject: [x265] asm: move pixel_sub to pixel-util8.asm, move pixel_avg funcdef to mc.h
details: http://hg.videolan.org/x265/rev/2de04bb5da1d
branches:
changeset: 5431:2de04bb5da1d
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 15:07:26 2013 -0600
description:
asm: move pixel_sub to pixel-util8.asm, move pixel_avg funcdef to mc.h
Subject: [x265] asm: move variance functions to pixel-util8.asm
details: http://hg.videolan.org/x265/rev/eea094a84b9c
branches:
changeset: 5432:eea094a84b9c
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 15:14:09 2013 -0600
description:
asm: move variance functions to pixel-util8.asm
Subject: [x265] asm: move ssd functions into their own ssd-a.asm file, similar to sad-a.asm
details: http://hg.videolan.org/x265/rev/a9f629fac91e
branches:
changeset: 5433:a9f629fac91e
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 15:26:14 2013 -0600
description:
asm: move ssd functions into their own ssd-a.asm file, similar to sad-a.asm
Subject: [x265] asm: make it more clear that pixel-a.asm has only satd and sa8d now
details: http://hg.videolan.org/x265/rev/70e127d735a5
branches:
changeset: 5434:70e127d735a5
user: Steve Borho <steve at borho.org>
date: Mon Dec 02 15:37:57 2013 -0600
description:
asm: make it more clear that pixel-a.asm has only satd and sa8d now
diffstat:
source/Lib/TLibEncoder/WeightPredAnalysis.cpp | 7 +-
source/common/CMakeLists.txt | 8 +-
source/common/pixel.cpp | 20 +-
source/common/x86/asm-primitives.cpp | 1 +
source/common/x86/blockcopy8.asm | 114 +
source/common/x86/mc.h | 32 +
source/common/x86/pixel-a.asm | 5178 +------------------------
source/common/x86/pixel-util.h | 143 +
source/common/x86/pixel-util8.asm | 2344 ++++++++++-
source/common/x86/pixel.h | 316 +-
source/common/x86/sad-a.asm | 496 --
source/common/x86/ssd-a.asm | 2177 ++++++++++
source/encoder/frameencoder.cpp | 9 +-
source/encoder/ratecontrol.cpp | 8 -
source/encoder/slicetype.cpp | 78 +-
source/encoder/slicetype.h | 6 +-
16 files changed, 4803 insertions(+), 6134 deletions(-)
diffs (truncated from 11310 to 300 lines):
diff -r c75c3431b108 -r 70e127d735a5 source/Lib/TLibEncoder/WeightPredAnalysis.cpp
--- a/source/Lib/TLibEncoder/WeightPredAnalysis.cpp Mon Dec 02 11:48:10 2013 +0530
+++ b/source/Lib/TLibEncoder/WeightPredAnalysis.cpp Mon Dec 02 15:37:57 2013 -0600
@@ -281,6 +281,7 @@ bool WeightPredAnalysis::xSelectWP(TComS
int height = pic->getHeight();
int defaultWeight = ((int)1 << denom);
int numPredDir = slice->isInterP() ? 1 : 2;
+ int numWeighted = 0;
for (int list = 0; list < numPredDir; list++)
{
@@ -313,7 +314,7 @@ bool WeightPredAnalysis::xSelectWP(TComS
SADnoWP += this->xCalcSADvalueWP(X265_DEPTH, fenc, fref, width >> 1, height >> 1, orgStride, refStride, denom, defaultWeight, 0);
double dRatio = ((double)SADWP / (double)SADnoWP);
- if (dRatio >= (double)DTHRESH)
+ if (dRatio >= (double)DTHRESH || numWeighted >= 8)
{
for (int comp = 0; comp < 3; comp++)
{
@@ -323,6 +324,10 @@ bool WeightPredAnalysis::xSelectWP(TComS
weightPredTable[list][refIdxTmp][comp].log2WeightDenom = (int)denom;
}
}
+ else
+ {
+ numWeighted++;
+ }
}
}
diff -r c75c3431b108 -r 70e127d735a5 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/CMakeLists.txt Mon Dec 02 15:37:57 2013 -0600
@@ -91,10 +91,10 @@ if(ENABLE_PRIMITIVES_VEC)
add_definitions(/Qwd280) # conditional expression is constant
endif()
if (X64)
+ set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE}")
+ else()
# x64 implies SSE4, so this flag would have no effect (and it issues a warning)
set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE} /arch:SSE2")
- else()
- set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE}")
endif()
endif()
if(GCC)
@@ -119,8 +119,8 @@ endif(ENABLE_PRIMITIVES_VEC)
if(ENABLE_PRIMITIVES_ASM)
set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h)
- set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm
- ipfilter8.asm pixel-util8.asm blockcopy8.asm intrapred8.asm
+ set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm ssd-a.asm mc-a.asm
+ mc-a2.asm ipfilter8.asm pixel-util8.asm blockcopy8.asm intrapred8.asm
pixeladd8.asm dct8.asm)
if (NOT X64)
set(A_SRCS ${A_SRCS} pixel-32.asm)
diff -r c75c3431b108 -r 70e127d735a5 source/common/pixel.cpp
--- a/source/common/pixel.cpp Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/pixel.cpp Mon Dec 02 15:37:57 2013 -0600
@@ -661,12 +661,12 @@ float ssim_end_1(int s1, int s2, int ss,
static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5);
static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5);
#endif
- type fs1 = s1;
- type fs2 = s2;
- type fss = ss;
- type fs12 = s12;
- type vars = fss * 64 - fs1 * fs1 - fs2 * fs2;
- type covar = fs12 * 64 - fs1 * fs2;
+ type fs1 = (type)s1;
+ type fs2 = (type)s2;
+ type fss = (type)ss;
+ type fs12 = (type)s12;
+ type vars = (type)(fss * 64 - fs1 * fs1 - fs2 * fs2);
+ type covar = (type)(fs12 * 64 - fs1 * fs2);
return (float)(2 * fs1 * fs2 + ssim_c1) * (float)(2 * covar + ssim_c2)
/ ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars + ssim_c2));
#undef type
@@ -901,16 +901,10 @@ void Setup_C_PixelPrimitives(EncoderPrim
LUMA(16, 64);
CHROMA(8, 32);
- //sse
-#if HIGH_BIT_DEPTH
- SET_FUNC_PRIMITIVE_TABLE_C(sse_pp, sse, pixelcmp_t, int16_t, int16_t)
- SET_FUNC_PRIMITIVE_TABLE_C(sse_sp, sse, pixelcmp_sp_t, int16_t, int16_t)
- SET_FUNC_PRIMITIVE_TABLE_C(sse_ss, sse, pixelcmp_ss_t, int16_t, int16_t)
-#else
SET_FUNC_PRIMITIVE_TABLE_C(sse_pp, sse, pixelcmp_t, pixel, pixel)
SET_FUNC_PRIMITIVE_TABLE_C(sse_sp, sse, pixelcmp_sp_t, int16_t, pixel)
SET_FUNC_PRIMITIVE_TABLE_C(sse_ss, sse, pixelcmp_ss_t, int16_t, int16_t)
-#endif
+
p.blockcpy_pp = blockcopy_p_p;
p.blockcpy_ps = blockcopy_p_s;
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/asm-primitives.cpp Mon Dec 02 15:37:57 2013 -0600
@@ -29,6 +29,7 @@
extern "C" {
#include "pixel.h"
+#include "pixel-util.h"
#include "mc.h"
#include "ipfilter8.h"
#include "blockcopy8.h"
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/blockcopy8.asm
--- a/source/common/x86/blockcopy8.asm Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/blockcopy8.asm Mon Dec 02 15:37:57 2013 -0600
@@ -2360,3 +2360,117 @@ BLOCKCOPY_PS_W64_H2 64, 16
BLOCKCOPY_PS_W64_H2 64, 32
BLOCKCOPY_PS_W64_H2 64, 48
BLOCKCOPY_PS_W64_H2 64, 64
+
+;-----------------------------------------------------------------------------
+; void cvt32to16_shr(short *dst, int *src, intptr_t stride, int shift, int size)
+;-----------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal cvt32to16_shr, 5, 7, 1, dst, src, stride
+%define rnd m7
+%define shift m6
+
+ ; make shift
+ mov r5d, r3m
+ movd shift, r5d
+
+ ; make round
+ dec r5
+ xor r6, r6
+ bts r6, r5
+
+ movd rnd, r6d
+ pshufd rnd, rnd, 0
+
+ ; register alloc
+ ; r0 - dst
+ ; r1 - src
+ ; r2 - stride * 2 (short*)
+ ; r3 - lx
+ ; r4 - size
+ ; r5 - ly
+ ; r6 - diff
+ lea r2, [r2 * 2]
+
+ mov r4d, r4m
+ mov r5, r4
+ mov r6, r2
+ sub r6, r4
+ lea r6, [r6 * 2]
+
+ shr r5, 1
+.loop_row:
+
+ mov r3, r4
+ shr r3, 2
+.loop_col:
+ ; row 0
+ movu m0, [r1]
+ paddd m0, rnd
+ psrad m0, shift
+ packssdw m0, m0
+ movh [r0], m0
+
+ ; row 1
+ movu m0, [r1 + r4 * 4]
+ paddd m0, rnd
+ psrad m0, shift
+ packssdw m0, m0
+ movh [r0 + r2], m0
+
+ ; move col pointer
+ add r1, 16
+ add r0, 8
+
+ dec r3
+ jg .loop_col
+
+ ; update pointer
+ lea r1, [r1 + r4 * 4]
+ add r0, r6
+
+ ; end of loop_row
+ dec r5
+ jg .loop_row
+
+ RET
+
+
+;--------------------------------------------------------------------------------------
+; void cvt16to32_shl(int32_t *dst, int16_t *src, intptr_t stride, int shift, int size);
+;--------------------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal cvt16to32_shl, 5, 7, 2, dst, src, stride, shift, size
+%define shift m6
+
+ ; make shift
+ mov r5d, r3m
+ movd shift, r5d
+
+ ; register alloc
+ ; r0 - dst
+ ; r1 - src
+ ; r2 - stride
+ ; r3 - shift
+ ; r4 - size
+
+ mov r5d, r4d
+ shr r4d, 2
+.loop_row
+ mov r6d, r4d
+
+.loop_col
+ pmovsxwd m0, [r1]
+ pslld m0, shift
+ movu [r0], m0
+
+ add r1, 8
+ add r0, 16
+
+ dec r6d
+ jnz .loop_col
+
+ dec r5d
+ jnz .loop_row
+
+ RET
+
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/mc.h
--- a/source/common/x86/mc.h Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/mc.h Mon Dec 02 15:37:57 2013 -0600
@@ -33,4 +33,36 @@ LOWRES(ssse3)
LOWRES(avx)
LOWRES(xop)
+#define DECL_SUF(func, args) \
+ void func ## _mmx2 args; \
+ void func ## _sse2 args; \
+ void func ## _ssse3 args;
+DECL_SUF(x265_pixel_avg_64x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_64x48, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_64x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_48x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x24, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x8, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_24x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x12, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x8, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x4, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_12x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x8, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x4, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x8, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x4, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+
+#undef LOWRES
+#undef DECL_SUF
+
#endif // ifndef X265_MC_H
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/pixel-a.asm
--- a/source/common/x86/pixel-a.asm Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/pixel-a.asm Mon Dec 02 15:37:57 2013 -0600
@@ -38,24 +38,9 @@ hmul_8p: times 8 db 1
times 4 db 1, -1
times 8 db 1
times 4 db 1, -1
-mask_ff: times 16 db 0xff
- times 16 db 0
-%if BIT_DEPTH == 10
-ssim_c1: times 4 dd 6697.7856 ; .01*.01*1023*1023*64
-ssim_c2: times 4 dd 3797644.4352 ; .03*.03*1023*1023*64*63
-pf_64: times 4 dd 64.0
-pf_128: times 4 dd 128.0
-%elif BIT_DEPTH == 9
-ssim_c1: times 4 dd 1671 ; .01*.01*511*511*64
-ssim_c2: times 4 dd 947556 ; .03*.03*511*511*64*63
-%else ; 8-bit
-ssim_c1: times 4 dd 416 ; .01*.01*255*255*64
-ssim_c2: times 4 dd 235963 ; .03*.03*255*255*64*63
-%endif
hmul_4p: times 2 db 1, 1, 1, 1, 1, -1, 1, -1
mask_10: times 4 dw 0, -1
mask_1100: times 2 dd 0, -1
-deinterleave_shuf: db 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15
ALIGN 32
transd_shuf1: SHUFFLE_MASK_W 0, 8, 2, 10, 4, 12, 6, 14
@@ -66,25 +51,6 @@ pd_f0: times 4 dd 0xffff0000
More information about the x265-commits
mailing list