[x265-commits] [x265] asm: disable x265_scale2D_64to32_ssse3, DUMA finds access...
Steve Borho
steve at borho.org
Wed Jan 8 00:20:10 CET 2014
details: http://hg.videolan.org/x265/rev/d4bef967ae10
branches:
changeset: 5800:d4bef967ae10
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 15:08:05 2014 -0600
description:
asm: disable x265_scale2D_64to32_ssse3, DUMA finds access violations
I tried simple buffer padding workarounds, adding 16 bytes at the start and end
of bufScale, but it was still causing the access violation.
Subject: [x265] slicetype: better prevention for compiler warnings and misbehaviors
details: http://hg.videolan.org/x265/rev/54835bf61c11
branches:
changeset: 5801:54835bf61c11
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 15:21:09 2014 -0600
description:
slicetype: better prevention for compiler warnings and misbehaviors
Subject: [x265] motion: add early out for subpel refine if bcost is already zero
details: http://hg.videolan.org/x265/rev/63d6b04fe201
branches:
changeset: 5802:63d6b04fe201
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 16:11:53 2014 -0600
description:
motion: add early out for subpel refine if bcost is already zero
Subject: [x265] TComBitStream: rename variables for clarity
details: http://hg.videolan.org/x265/rev/324d99e3d6ac
branches:
changeset: 5803:324d99e3d6ac
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 16:12:45 2014 -0600
description:
TComBitStream: rename variables for clarity
There was no point making cnt an unsigned variable when the return value is
signed, this just adds more compiler warnings
Subject: [x265] TComBitstream: simplify and streamline start code checks
details: http://hg.videolan.org/x265/rev/e1ee0fc31e79
branches:
changeset: 5804:e1ee0fc31e79
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 16:15:21 2014 -0600
description:
TComBitstream: simplify and streamline start code checks
Subject: [x265] ignore vim swap files
details: http://hg.videolan.org/x265/rev/6d40ab7be379
branches:
changeset: 5805:6d40ab7be379
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 16:17:52 2014 -0600
description:
ignore vim swap files
Subject: [x265] TComBitStream: fix loop bounds so we do not check past end of buffer
details: http://hg.videolan.org/x265/rev/bd9b395c80c7
branches:
changeset: 5806:bd9b395c80c7
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 16:26:32 2014 -0600
description:
TComBitStream: fix loop bounds so we do not check past end of buffer
Subject: [x265] wtf? a useless comment and if()/else() with two identical statements?
details: http://hg.videolan.org/x265/rev/c1cf926c20e0
branches:
changeset: 5807:c1cf926c20e0
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 23:14:06 2014 -0600
description:
wtf? a useless comment and if()/else() with two identical statements?
Subject: [x265] TComPrediction: simplify luma intra prediction function
details: http://hg.videolan.org/x265/rev/4811da38078c
branches:
changeset: 5808:4811da38078c
user: Steve Borho <steve at borho.org>
date: Mon Jan 06 23:15:58 2014 -0600
description:
TComPrediction: simplify luma intra prediction function
Subject: [x265] correct number of xmm register on interp_8tap_horiz*
details: http://hg.videolan.org/x265/rev/ca7bde495318
branches:
changeset: 5809:ca7bde495318
user: Min Chen <chenm003 at 163.com>
date: Tue Jan 07 18:36:48 2014 +0800
description:
correct number of xmm register on interp_8tap_horiz*
Subject: [x265] asm: fix memory access violation due to scale2D_64to32
details: http://hg.videolan.org/x265/rev/c4edab8dab65
branches:
changeset: 5810:c4edab8dab65
user: Murugan Vairavel <murugan at multicorewareinc.com>
date: Tue Jan 07 18:36:17 2014 +0530
description:
asm: fix memory access violation due to scale2D_64to32
diffstat:
.hgignore | 1 +
source/Lib/TLibCommon/TComBitStream.cpp | 13 ++---
source/Lib/TLibCommon/TComPrediction.cpp | 24 +----------
source/common/x86/ipfilter8.asm | 2 +-
source/common/x86/pixel-util8.asm | 67 +++++++++++++++++--------------
source/encoder/motion.cpp | 6 ++-
source/encoder/slicetype.cpp | 4 +-
7 files changed, 56 insertions(+), 61 deletions(-)
diffs (truncated from 309 to 300 lines):
diff -r abd4da45823c -r c4edab8dab65 .hgignore
--- a/.hgignore Wed Jan 01 15:52:11 2014 -0600
+++ b/.hgignore Tue Jan 07 18:36:17 2014 +0530
@@ -7,3 +7,4 @@ build/
**.yuv
**.y4m
**.out
+**.swp
diff -r abd4da45823c -r c4edab8dab65 source/Lib/TLibCommon/TComBitStream.cpp
--- a/source/Lib/TLibCommon/TComBitStream.cpp Wed Jan 01 15:52:11 2014 -0600
+++ b/source/Lib/TLibCommon/TComBitStream.cpp Tue Jan 07 18:36:17 2014 +0530
@@ -184,21 +184,20 @@ void TComOutputBitstream::writeByteAlign
int TComOutputBitstream::countStartCodeEmulations()
{
- uint32_t cnt = 0;
+ int numStartCodes = 0;
uint8_t *rbsp = getFIFO();
uint32_t fsize = getByteStreamLength();
- for (uint32_t count = 0; count < fsize; count++)
+ for (uint32_t i = 0; i + 2 < fsize; i++)
{
- if ((rbsp[count + 2] == 0x00 || rbsp[count + 2] == 0x01 || rbsp[count + 2] == 0x02 || rbsp[count + 2] == 0x03)
- && rbsp[count + 1] == 0x00 && rbsp[count] == 0x00)
+ if (!rbsp[i] && !rbsp[i + 1] && rbsp[i + 2] <= 3)
{
- cnt++;
- count = count + 1;
+ numStartCodes++;
+ i++;
}
}
- return cnt;
+ return numStartCodes;
}
void TComOutputBitstream::push_back(uint8_t val)
diff -r abd4da45823c -r c4edab8dab65 source/Lib/TLibCommon/TComPrediction.cpp
--- a/source/Lib/TLibCommon/TComPrediction.cpp Wed Jan 01 15:52:11 2014 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp Tue Jan 07 18:36:17 2014 +0530
@@ -153,18 +153,8 @@ void TComPrediction::predIntraLumaAng(ui
refAbv = refAboveFlt + size - 1;
}
- // get starting pixel in block
- bool bFilter = (size <= 16);
-
- // Create the prediction
- if (dirMode == PLANAR_IDX)
- {
- primitives.intra_pred[log2BlkSize - 2][PLANAR_IDX](dst, stride, refLft, refAbv, dirMode, 0);
- }
- else
- {
- primitives.intra_pred[log2BlkSize - 2][dirMode](dst, stride, refLft, refAbv, dirMode, bFilter);
- }
+ bool bFilter = size <= 16 && dirMode != PLANAR_IDX;
+ primitives.intra_pred[log2BlkSize - 2][dirMode](dst, stride, refLft, refAbv, dirMode, bFilter);
}
// Angular chroma
@@ -183,15 +173,7 @@ void TComPrediction::predIntraChromaAng(
refLft[k + width - 1] = src[k * ADI_BUF_STRIDE];
}
- // get starting pixel in block
- if (dirMode == PLANAR_IDX)
- {
- primitives.intra_pred[log2BlkSize][dirMode](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
- }
- else
- {
- primitives.intra_pred[log2BlkSize][dirMode](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
- }
+ primitives.intra_pred[log2BlkSize][dirMode](dst, stride, refLft + width - 1, refAbv + width - 1, dirMode, 0);
}
/** Function for checking identical motion.
diff -r abd4da45823c -r c4edab8dab65 source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm Wed Jan 01 15:52:11 2014 -0600
+++ b/source/common/x86/ipfilter8.asm Tue Jan 07 18:36:17 2014 +0530
@@ -623,7 +623,7 @@ IPFILTER_CHROMA_W 32, 32
;----------------------------------------------------------------------------------------------------------------------------
%macro IPFILTER_LUMA 3
INIT_XMM sse4
-cglobal interp_8tap_horiz_%3_%1x%2, 4, 7, 5
+cglobal interp_8tap_horiz_%3_%1x%2, 4,7,6
mov r4d, r4m
diff -r abd4da45823c -r c4edab8dab65 source/common/x86/pixel-util8.asm
--- a/source/common/x86/pixel-util8.asm Wed Jan 01 15:52:11 2014 -0600
+++ b/source/common/x86/pixel-util8.asm Tue Jan 07 18:36:17 2014 +0530
@@ -2325,17 +2325,17 @@ RET
;-----------------------------------------------------------------
; void scale2D_64to32(pixel *dst, pixel *src, intptr_t stride)
;-----------------------------------------------------------------
+%if HIGH_BIT_DEPTH
INIT_XMM ssse3
cglobal scale2D_64to32, 3, 4, 8, dest, src, stride
mov r3d, 32
-%if HIGH_BIT_DEPTH
mova m7, [deinterleave_word_shuf]
add r2, r2
.loop
movu m0, [r1] ;i
- movu m1, [r1 + 2] ;j
+ psrld m1, m0, 16 ;j
movu m2, [r1 + r2] ;k
- movu m3, [r1 + r2 + 2] ;l
+ psrld m3, m2, 16 ;l
movu m4, m0
movu m5, m2
pxor m4, m1 ;i^j
@@ -2350,9 +2350,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
pand m4, [hmulw_16p]
psubw m0, m4 ;Result
movu m1, [r1 + 16] ;i
- movu m2, [r1 + 16 + 2] ;j
+ psrld m2, m1, 16 ;j
movu m3, [r1 + r2 + 16] ;k
- movu m4, [r1 + r2 + 16 + 2] ;l
+ psrld m4, m3, 16 ;l
movu m5, m1
movu m6, m3
pxor m5, m2 ;i^j
@@ -2373,9 +2373,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
movu [r0], m0
movu m0, [r1 + 32] ;i
- movu m1, [r1 + 32 + 2] ;j
+ psrld m1, m0, 16 ;j
movu m2, [r1 + r2 + 32] ;k
- movu m3, [r1 + r2 + 32 + 2] ;l
+ psrld m3, m2, 16 ;l
movu m4, m0
movu m5, m2
pxor m4, m1 ;i^j
@@ -2390,9 +2390,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
pand m4, [hmulw_16p]
psubw m0, m4 ;Result
movu m1, [r1 + 48] ;i
- movu m2, [r1 + 48 + 2] ;j
+ psrld m2, m1, 16 ;j
movu m3, [r1 + r2 + 48] ;k
- movu m4, [r1 + r2 + 48 + 2] ;l
+ psrld m4, m3, 16 ;l
movu m5, m1
movu m6, m3
pxor m5, m2 ;i^j
@@ -2413,9 +2413,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
movu [r0 + 16], m0
movu m0, [r1 + 64] ;i
- movu m1, [r1 + 64 + 2] ;j
+ psrld m1, m0, 16 ;j
movu m2, [r1 + r2 + 64] ;k
- movu m3, [r1 + r2 + 64 + 2] ;l
+ psrld m3, m2, 16 ;l
movu m4, m0
movu m5, m2
pxor m4, m1 ;i^j
@@ -2430,9 +2430,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
pand m4, [hmulw_16p]
psubw m0, m4 ;Result
movu m1, [r1 + 80] ;i
- movu m2, [r1 + 80 + 2] ;j
+ psrld m2, m1, 16 ;j
movu m3, [r1 + r2 + 80] ;k
- movu m4, [r1 + r2 + 80 + 2] ;l
+ psrld m4, m3, 16 ;l
movu m5, m1
movu m6, m3
pxor m5, m2 ;i^j
@@ -2453,9 +2453,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
movu [r0 + 32], m0
movu m0, [r1 + 96] ;i
- movu m1, [r1 + 96 + 2] ;j
+ psrld m1, m0, 16 ;j
movu m2, [r1 + r2 + 96] ;k
- movu m3, [r1 + r2 + 96 + 2] ;l
+ psrld m3, m2, 16 ;l
movu m4, m0
movu m5, m2
pxor m4, m1 ;i^j
@@ -2469,10 +2469,10 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
pand m4, m5 ;(ij|kl)&st
pand m4, [hmulw_16p]
psubw m0, m4 ;Result
- movu m1, [r1 + 112] ;i
- movu m2, [r1 + 112 + 2] ;j
- movu m3, [r1 + r2 + 112] ;k
- movu m4, [r1 + r2 + 112 + 2] ;l
+ movu m1, [r1 + 112] ;i
+ psrld m2, m1, 16 ;j
+ movu m3, [r1 + r2 + 112] ;k
+ psrld m4, m3, 16 ;l
movu m5, m1
movu m6, m3
pxor m5, m2 ;i^j
@@ -2492,14 +2492,22 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
punpcklqdq m0, m1
movu [r0 + 48], m0
lea r0, [r0 + 64]
+ lea r1, [r1 + 2 * r2]
+ dec r3d
+ jnz .loop
+ RET
%else
+
+INIT_XMM ssse3
+cglobal scale2D_64to32, 3, 4, 8, dest, src, stride
+ mov r3d, 32
mova m7, [deinterleave_shuf]
.loop
movu m0, [r1] ;i
- movu m1, [r1 + 1] ;j
+ psrlw m1, m0, 8 ;j
movu m2, [r1 + r2] ;k
- movu m3, [r1 + r2 + 1] ;l
+ psrlw m3, m2, 8 ;l
movu m4, m0
movu m5, m2
@@ -2517,9 +2525,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
psubb m0, m4 ;Result
movu m1, [r1 + 16] ;i
- movu m2, [r1 + 16 + 1] ;j
+ psrlw m2, m1, 8 ;j
movu m3, [r1 + r2 + 16] ;k
- movu m4, [r1 + r2 + 16 + 1] ;l
+ psrlw m4, m3, 8 ;l
movu m5, m1
movu m6, m3
@@ -2543,9 +2551,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
movu [r0], m0
movu m0, [r1 + 32] ;i
- movu m1, [r1 + 32 + 1] ;j
+ psrlw m1, m0, 8 ;j
movu m2, [r1 + r2 + 32] ;k
- movu m3, [r1 + r2 + 32 + 1] ;l
+ psrlw m3, m2, 8 ;l
movu m4, m0
movu m5, m2
@@ -2563,9 +2571,9 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
psubb m0, m4 ;Result
movu m1, [r1 + 48] ;i
- movu m2, [r1 + 48 + 1] ;j
+ psrlw m2, m1, 8 ;j
movu m3, [r1 + r2 + 48] ;k
- movu m4, [r1 + r2 + 48 + 1] ;l
+ psrlw m4, m3, 8 ;l
movu m5, m1
movu m6, m3
@@ -2589,12 +2597,11 @@ cglobal scale2D_64to32, 3, 4, 8, dest, s
movu [r0 + 16], m0
lea r0, [r0 + 32]
-%endif
lea r1, [r1 + 2 * r2]
dec r3d
jnz .loop
-
-RET
+ RET
+%endif
;-----------------------------------------------------------------------------
diff -r abd4da45823c -r c4edab8dab65 source/encoder/motion.cpp
--- a/source/encoder/motion.cpp Wed Jan 01 15:52:11 2014 -0600
+++ b/source/encoder/motion.cpp Tue Jan 07 18:36:17 2014 +0530
@@ -1051,7 +1051,11 @@ me_hex2:
SubpelWorkload& wl = workload[this->subpelRefine];
- if (ref->isLowres)
+ if (!bcost)
+ {
+ /* subpel refine isn't going to improve this */
+ }
+ else if (ref->isLowres)
{
int bdir = 0, cost;
for (int i = 1; i <= wl.hpel_dirs; i++)
diff -r abd4da45823c -r c4edab8dab65 source/encoder/slicetype.cpp
--- a/source/encoder/slicetype.cpp Wed Jan 01 15:52:11 2014 -0600
+++ b/source/encoder/slicetype.cpp Tue Jan 07 18:36:17 2014 +0530
@@ -858,7 +858,9 @@ void Lookahead::slicetypeDecide()
}
More information about the x265-commits
mailing list