[x265-commits] [x265] log: make qTreeCnt as stack arrays to avoid non determini...
Aarthi at videolan.org
Aarthi at videolan.org
Tue Apr 28 20:02:03 CEST 2015
details: http://hg.videolan.org/x265/rev/37d8bfd67c68
branches:
changeset: 10310:37d8bfd67c68
user: Aarthi Thirumalai
date: Tue Apr 28 18:24:44 2015 +0530
description:
log: make qTreeCnt as stack arrays to avoid non determinism in 2 pass
currently,qTreeCount arrays collected in log stats for each sliceType is not protected and may cause
inconsistency in 2 pass encodes when multiple rows finish simultaneously.
Subject: [x265] asm: filter_vsp, filter_vss for 2x16 in avx2
details: http://hg.videolan.org/x265/rev/14ad22dc101f
branches:
changeset: 10311:14ad22dc101f
user: Divya Manivannan <divya at multicorewareinc.com>
date: Tue Apr 28 13:30:29 2015 +0530
description:
asm: filter_vsp, filter_vss for 2x16 in avx2
filter_vsp[2x16]: 816c->655c
filter_vss[2x16]: 757c->572c
Subject: [x265] asm: filter_vsp, filter_vss for 16x24 in avx2
details: http://hg.videolan.org/x265/rev/105e872ecc65
branches:
changeset: 10312:105e872ecc65
user: Divya Manivannan <divya at multicorewareinc.com>
date: Tue Apr 28 14:27:11 2015 +0530
description:
asm: filter_vsp, filter_vss for 16x24 in avx2
filter_vsp[16x24]: 4357c->2865c
filter_vss[16x24]: 3545c->3171c
Subject: [x265] asm: filter_vsp, filter_vss for 12x32 in avx2
details: http://hg.videolan.org/x265/rev/d8083524b4fc
branches:
changeset: 10313:d8083524b4fc
user: Divya Manivannan <divya at multicorewareinc.com>
date: Tue Apr 28 15:47:21 2015 +0530
description:
asm: filter_vsp, filter_vss for 12x32 in avx2
filter_vsp[12x32]: 4587c->3164c
filter_vss[12x32]: 3632c->2919c
Subject: [x265] asm: filter_vsp, filter_vss for 4x32 in avx2
details: http://hg.videolan.org/x265/rev/88752bc59365
branches:
changeset: 10314:88752bc59365
user: Divya Manivannan <divya at multicorewareinc.com>
date: Tue Apr 28 16:10:56 2015 +0530
description:
asm: filter_vsp, filter_vss for 4x32 in avx2
filter_vsp[4x32]: 1750c->1159c
filter_vss[4x32]: 1409c->969c
Subject: [x265] asm: avx2 code for sad[16x8] for 10 bpp (398 -> 254)
details: http://hg.videolan.org/x265/rev/fc49b5ebfd32
branches:
changeset: 10315:fc49b5ebfd32
user: Sumalatha Polureddy
date: Tue Apr 28 17:16:32 2015 +0530
description:
asm: avx2 code for sad[16x8] for 10 bpp (398 -> 254)
sse2
sad[ 16x8] 3.30x 398.28 1313.01
avx2
sad[ 16x8] 5.48x 254.91 1398.03
Subject: [x265] asm: remove tab_c_526336, it is duplicate to pd_526336
details: http://hg.videolan.org/x265/rev/92a400fdcfb4
branches:
changeset: 10316:92a400fdcfb4
user: Min Chen <chenm003 at 163.com>
date: Tue Apr 28 18:43:33 2015 +0800
description:
asm: remove tab_c_526336, it is duplicate to pd_526336
Subject: [x265] asm: use prefix const to avoid unaligned crash
details: http://hg.videolan.org/x265/rev/d659b200011b
branches:
changeset: 10317:d659b200011b
user: Min Chen <chenm003 at 163.com>
date: Tue Apr 28 20:23:56 2015 +0800
description:
asm: use prefix const to avoid unaligned crash
Subject: [x265] asm: remove interp4_hps_shuf, it is duplicate to interp4_hpp_shuf
details: http://hg.videolan.org/x265/rev/2f67dda63f40
branches:
changeset: 10318:2f67dda63f40
user: Min Chen <chenm003 at 163.com>
date: Tue Apr 28 20:24:01 2015 +0800
description:
asm: remove interp4_hps_shuf, it is duplicate to interp4_hpp_shuf
Subject: [x265] simplify logic on posOffset in codeCoeffNxN()
details: http://hg.videolan.org/x265/rev/e9df93f38066
branches:
changeset: 10319:e9df93f38066
user: Min Chen <chenm003 at 163.com>
date: Tue Apr 28 20:24:06 2015 +0800
description:
simplify logic on posOffset in codeCoeffNxN()
diffstat:
source/common/x86/asm-primitives.cpp | 10 +
source/common/x86/ipfilter8.asm | 797 ++++++++++++++++++++++------------
source/common/x86/sad16-a.asm | 51 ++
source/encoder/entropy.cpp | 9 +-
source/encoder/frameencoder.cpp | 30 +-
source/encoder/frameencoder.h | 7 +-
6 files changed, 591 insertions(+), 313 deletions(-)
diffs (truncated from 1208 to 300 lines):
diff -r 13290abce292 -r e9df93f38066 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Mon Apr 27 14:15:28 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Tue Apr 28 20:24:06 2015 +0800
@@ -1239,6 +1239,8 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].sub_ps = x265_pixel_sub_ps_16x32_avx2;
p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].sub_ps = x265_pixel_sub_ps_32x64_avx2;
+ p.pu[LUMA_16x8].sad = x265_pixel_sad_16x8_avx2;
+
p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_avx2;
p.pu[LUMA_16x8].convert_p2s = x265_filterPixelToShort_16x8_avx2;
p.pu[LUMA_16x12].convert_p2s = x265_filterPixelToShort_16x12_avx2;
@@ -2285,6 +2287,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vss = x265_interp_4tap_vert_ss_32x48_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vss = x265_interp_4tap_vert_ss_8x12_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].filter_vss = x265_interp_4tap_vert_ss_6x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_vss = x265_interp_4tap_vert_ss_2x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_vss = x265_interp_4tap_vert_ss_16x24_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vss = x265_interp_4tap_vert_ss_12x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vss = x265_interp_4tap_vert_ss_4x32_avx2;
//i444 for chroma_vss
p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
@@ -2471,6 +2477,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vsp = x265_interp_4tap_vert_sp_32x48_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vsp = x265_interp_4tap_vert_sp_8x12_avx2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].filter_vsp = x265_interp_4tap_vert_sp_6x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_vsp = x265_interp_4tap_vert_sp_2x16_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_vsp = x265_interp_4tap_vert_sp_16x24_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vsp = x265_interp_4tap_vert_sp_12x32_avx2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vsp = x265_interp_4tap_vert_sp_4x32_avx2;
//i444 for chroma_vsp
p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
diff -r 13290abce292 -r e9df93f38066 source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm Mon Apr 27 14:15:28 2015 -0500
+++ b/source/common/x86/ipfilter8.asm Tue Apr 28 20:24:06 2015 +0800
@@ -27,282 +27,264 @@
%include "x86util.asm"
SECTION_RODATA 32
-tab_Tm: db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
- db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
- db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
-
-ALIGN 32
+const tab_Tm, db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
+ db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
+ db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
+
const interp4_vpp_shuf, times 2 db 0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15
-ALIGN 32
const interp_vert_shuf, times 2 db 0, 2, 1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6, 8, 7, 9
times 2 db 4, 6, 5, 7, 6, 8, 7, 9, 8, 10, 9, 11, 10, 12, 11, 13
-ALIGN 32
const interp4_vpp_shuf1, dd 0, 1, 1, 2, 2, 3, 3, 4
dd 2, 3, 3, 4, 4, 5, 5, 6
-ALIGN 32
const pb_8tap_hps_0, times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
times 2 db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10
times 2 db 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12
times 2 db 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14
-ALIGN 32
-tab_Lm: db 0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8
- db 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 10
- db 4, 5, 6, 7, 8, 9, 10, 11, 5, 6, 7, 8, 9, 10, 11, 12
- db 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14
-
-tab_Vm: db 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
- db 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3
-
-tab_Cm: db 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3
-
-tab_c_526336: times 4 dd 8192*64+2048
-
-pd_526336: times 8 dd 8192*64+2048
-
-tab_ChromaCoeff: db 0, 64, 0, 0
- db -2, 58, 10, -2
- db -4, 54, 16, -2
- db -6, 46, 28, -4
- db -4, 36, 36, -4
- db -4, 28, 46, -6
- db -2, 16, 54, -4
- db -2, 10, 58, -2
-
-tabw_ChromaCoeff: dw 0, 64, 0, 0
- dw -2, 58, 10, -2
- dw -4, 54, 16, -2
- dw -6, 46, 28, -4
- dw -4, 36, 36, -4
- dw -4, 28, 46, -6
- dw -2, 16, 54, -4
- dw -2, 10, 58, -2
-
-ALIGN 32
-tab_ChromaCoeff_V: times 8 db 0, 64
- times 8 db 0, 0
-
- times 8 db -2, 58
- times 8 db 10, -2
-
- times 8 db -4, 54
- times 8 db 16, -2
-
- times 8 db -6, 46
- times 8 db 28, -4
-
- times 8 db -4, 36
- times 8 db 36, -4
-
- times 8 db -4, 28
- times 8 db 46, -6
-
- times 8 db -2, 16
- times 8 db 54, -4
-
- times 8 db -2, 10
- times 8 db 58, -2
-
-tab_ChromaCoeffV: times 4 dw 0, 64
- times 4 dw 0, 0
-
- times 4 dw -2, 58
- times 4 dw 10, -2
-
- times 4 dw -4, 54
- times 4 dw 16, -2
-
- times 4 dw -6, 46
- times 4 dw 28, -4
-
- times 4 dw -4, 36
- times 4 dw 36, -4
-
- times 4 dw -4, 28
- times 4 dw 46, -6
-
- times 4 dw -2, 16
- times 4 dw 54, -4
-
- times 4 dw -2, 10
- times 4 dw 58, -2
-
-ALIGN 32
-pw_ChromaCoeffV: times 8 dw 0, 64
- times 8 dw 0, 0
-
- times 8 dw -2, 58
- times 8 dw 10, -2
-
- times 8 dw -4, 54
- times 8 dw 16, -2
-
- times 8 dw -6, 46
- times 8 dw 28, -4
-
- times 8 dw -4, 36
- times 8 dw 36, -4
-
- times 8 dw -4, 28
- times 8 dw 46, -6
-
- times 8 dw -2, 16
- times 8 dw 54, -4
-
- times 8 dw -2, 10
- times 8 dw 58, -2
-
-tab_LumaCoeff: db 0, 0, 0, 64, 0, 0, 0, 0
- db -1, 4, -10, 58, 17, -5, 1, 0
- db -1, 4, -11, 40, 40, -11, 4, -1
- db 0, 1, -5, 17, 58, -10, 4, -1
-
-tab_LumaCoeffV: times 4 dw 0, 0
- times 4 dw 0, 64
- times 4 dw 0, 0
- times 4 dw 0, 0
-
- times 4 dw -1, 4
- times 4 dw -10, 58
- times 4 dw 17, -5
- times 4 dw 1, 0
-
- times 4 dw -1, 4
- times 4 dw -11, 40
- times 4 dw 40, -11
- times 4 dw 4, -1
-
- times 4 dw 0, 1
- times 4 dw -5, 17
- times 4 dw 58, -10
- times 4 dw 4, -1
-
-ALIGN 32
-pw_LumaCoeffVer: times 8 dw 0, 0
- times 8 dw 0, 64
- times 8 dw 0, 0
- times 8 dw 0, 0
-
- times 8 dw -1, 4
- times 8 dw -10, 58
- times 8 dw 17, -5
- times 8 dw 1, 0
-
- times 8 dw -1, 4
- times 8 dw -11, 40
- times 8 dw 40, -11
- times 8 dw 4, -1
-
- times 8 dw 0, 1
- times 8 dw -5, 17
- times 8 dw 58, -10
- times 8 dw 4, -1
-
-pb_LumaCoeffVer: times 16 db 0, 0
- times 16 db 0, 64
- times 16 db 0, 0
- times 16 db 0, 0
-
- times 16 db -1, 4
- times 16 db -10, 58
- times 16 db 17, -5
- times 16 db 1, 0
-
- times 16 db -1, 4
- times 16 db -11, 40
- times 16 db 40, -11
- times 16 db 4, -1
-
- times 16 db 0, 1
- times 16 db -5, 17
- times 16 db 58, -10
- times 16 db 4, -1
-
-tab_LumaCoeffVer: times 8 db 0, 0
- times 8 db 0, 64
- times 8 db 0, 0
- times 8 db 0, 0
-
- times 8 db -1, 4
- times 8 db -10, 58
- times 8 db 17, -5
- times 8 db 1, 0
-
- times 8 db -1, 4
- times 8 db -11, 40
- times 8 db 40, -11
- times 8 db 4, -1
-
- times 8 db 0, 1
- times 8 db -5, 17
- times 8 db 58, -10
- times 8 db 4, -1
-
-ALIGN 32
-tab_LumaCoeffVer_32: times 16 db 0, 0
- times 16 db 0, 64
- times 16 db 0, 0
- times 16 db 0, 0
-
- times 16 db -1, 4
- times 16 db -10, 58
- times 16 db 17, -5
- times 16 db 1, 0
-
- times 16 db -1, 4
- times 16 db -11, 40
- times 16 db 40, -11
- times 16 db 4, -1
-
- times 16 db 0, 1
- times 16 db -5, 17
- times 16 db 58, -10
- times 16 db 4, -1
-
-ALIGN 32
-tab_ChromaCoeffVer_32: times 16 db 0, 64
- times 16 db 0, 0
-
- times 16 db -2, 58
- times 16 db 10, -2
-
- times 16 db -4, 54
- times 16 db 16, -2
-
- times 16 db -6, 46
- times 16 db 28, -4
-
- times 16 db -4, 36
- times 16 db 36, -4
-
- times 16 db -4, 28
- times 16 db 46, -6
-
- times 16 db -2, 16
More information about the x265-commits
mailing list