[x265-commits] [x265] log: make qTreeCnt as stack arrays to avoid non determini...

Aarthi at videolan.org Aarthi at videolan.org
Tue Apr 28 20:02:03 CEST 2015


details:   http://hg.videolan.org/x265/rev/37d8bfd67c68
branches:  
changeset: 10310:37d8bfd67c68
user:      Aarthi Thirumalai
date:      Tue Apr 28 18:24:44 2015 +0530
description:
log: make qTreeCnt as stack arrays to avoid non determinism in 2 pass

currently,qTreeCount arrays collected in log stats for each sliceType is not protected and may cause
inconsistency in 2 pass encodes when multiple rows finish simultaneously.
Subject: [x265] asm: filter_vsp, filter_vss for 2x16 in avx2

details:   http://hg.videolan.org/x265/rev/14ad22dc101f
branches:  
changeset: 10311:14ad22dc101f
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Apr 28 13:30:29 2015 +0530
description:
asm: filter_vsp, filter_vss for 2x16 in avx2

filter_vsp[2x16]: 816c->655c
filter_vss[2x16]: 757c->572c
Subject: [x265] asm: filter_vsp, filter_vss for 16x24 in avx2

details:   http://hg.videolan.org/x265/rev/105e872ecc65
branches:  
changeset: 10312:105e872ecc65
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Apr 28 14:27:11 2015 +0530
description:
asm: filter_vsp, filter_vss for 16x24 in avx2

filter_vsp[16x24]: 4357c->2865c
filter_vss[16x24]: 3545c->3171c
Subject: [x265] asm: filter_vsp, filter_vss for 12x32 in avx2

details:   http://hg.videolan.org/x265/rev/d8083524b4fc
branches:  
changeset: 10313:d8083524b4fc
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Apr 28 15:47:21 2015 +0530
description:
asm: filter_vsp, filter_vss for 12x32 in avx2

filter_vsp[12x32]: 4587c->3164c
filter_vss[12x32]: 3632c->2919c
Subject: [x265] asm: filter_vsp, filter_vss for 4x32 in avx2

details:   http://hg.videolan.org/x265/rev/88752bc59365
branches:  
changeset: 10314:88752bc59365
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Apr 28 16:10:56 2015 +0530
description:
asm: filter_vsp, filter_vss for 4x32 in avx2

filter_vsp[4x32]: 1750c->1159c
filter_vss[4x32]: 1409c->969c
Subject: [x265] asm: avx2 code for sad[16x8] for 10 bpp (398 -> 254)

details:   http://hg.videolan.org/x265/rev/fc49b5ebfd32
branches:  
changeset: 10315:fc49b5ebfd32
user:      Sumalatha Polureddy
date:      Tue Apr 28 17:16:32 2015 +0530
description:
asm: avx2 code for sad[16x8] for 10 bpp (398 -> 254)

sse2
sad[ 16x8]  3.30x    398.28          1313.01
avx2
sad[ 16x8]  5.48x    254.91          1398.03
Subject: [x265] asm: remove tab_c_526336, it is duplicate to pd_526336

details:   http://hg.videolan.org/x265/rev/92a400fdcfb4
branches:  
changeset: 10316:92a400fdcfb4
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 28 18:43:33 2015 +0800
description:
asm: remove tab_c_526336, it is duplicate to pd_526336
Subject: [x265] asm: use prefix const to avoid unaligned crash

details:   http://hg.videolan.org/x265/rev/d659b200011b
branches:  
changeset: 10317:d659b200011b
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 28 20:23:56 2015 +0800
description:
asm: use prefix const to avoid unaligned crash
Subject: [x265] asm: remove interp4_hps_shuf, it is duplicate to interp4_hpp_shuf

details:   http://hg.videolan.org/x265/rev/2f67dda63f40
branches:  
changeset: 10318:2f67dda63f40
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 28 20:24:01 2015 +0800
description:
asm: remove interp4_hps_shuf, it is duplicate to interp4_hpp_shuf
Subject: [x265] simplify logic on posOffset in codeCoeffNxN()

details:   http://hg.videolan.org/x265/rev/e9df93f38066
branches:  
changeset: 10319:e9df93f38066
user:      Min Chen <chenm003 at 163.com>
date:      Tue Apr 28 20:24:06 2015 +0800
description:
simplify logic on posOffset in codeCoeffNxN()

diffstat:

 source/common/x86/asm-primitives.cpp |   10 +
 source/common/x86/ipfilter8.asm      |  797 ++++++++++++++++++++++------------
 source/common/x86/sad16-a.asm        |   51 ++
 source/encoder/entropy.cpp           |    9 +-
 source/encoder/frameencoder.cpp      |   30 +-
 source/encoder/frameencoder.h        |    7 +-
 6 files changed, 591 insertions(+), 313 deletions(-)

diffs (truncated from 1208 to 300 lines):

diff -r 13290abce292 -r e9df93f38066 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Mon Apr 27 14:15:28 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp	Tue Apr 28 20:24:06 2015 +0800
@@ -1239,6 +1239,8 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].sub_ps = x265_pixel_sub_ps_16x32_avx2;
         p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].sub_ps = x265_pixel_sub_ps_32x64_avx2;
 
+        p.pu[LUMA_16x8].sad = x265_pixel_sad_16x8_avx2;
+
         p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_avx2;
         p.pu[LUMA_16x8].convert_p2s = x265_filterPixelToShort_16x8_avx2;
         p.pu[LUMA_16x12].convert_p2s = x265_filterPixelToShort_16x12_avx2;
@@ -2285,6 +2287,10 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vss = x265_interp_4tap_vert_ss_32x48_avx2;
         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vss = x265_interp_4tap_vert_ss_8x12_avx2;
         p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].filter_vss = x265_interp_4tap_vert_ss_6x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_vss = x265_interp_4tap_vert_ss_2x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_vss = x265_interp_4tap_vert_ss_16x24_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vss = x265_interp_4tap_vert_ss_12x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vss = x265_interp_4tap_vert_ss_4x32_avx2;
 
         //i444 for chroma_vss
         p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vss = x265_interp_4tap_vert_ss_4x4_avx2;
@@ -2471,6 +2477,10 @@ void setupAssemblyPrimitives(EncoderPrim
         p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].filter_vsp = x265_interp_4tap_vert_sp_32x48_avx2;
         p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].filter_vsp = x265_interp_4tap_vert_sp_8x12_avx2;
         p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].filter_vsp = x265_interp_4tap_vert_sp_6x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].filter_vsp = x265_interp_4tap_vert_sp_2x16_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].filter_vsp = x265_interp_4tap_vert_sp_16x24_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].filter_vsp = x265_interp_4tap_vert_sp_12x32_avx2;
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vsp = x265_interp_4tap_vert_sp_4x32_avx2;
 
         //i444 for chroma_vsp
         p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vsp = x265_interp_4tap_vert_sp_4x4_avx2;
diff -r 13290abce292 -r e9df93f38066 source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm	Mon Apr 27 14:15:28 2015 -0500
+++ b/source/common/x86/ipfilter8.asm	Tue Apr 28 20:24:06 2015 +0800
@@ -27,282 +27,264 @@
 %include "x86util.asm"
 
 SECTION_RODATA 32
-tab_Tm:    db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
-           db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
-           db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
-
-ALIGN 32
+const tab_Tm,    db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
+                 db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
+                 db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
+
 const interp4_vpp_shuf, times 2 db 0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15
 
-ALIGN 32
 const interp_vert_shuf, times 2 db 0, 2, 1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6, 8, 7, 9
                         times 2 db 4, 6, 5, 7, 6, 8, 7, 9, 8, 10, 9, 11, 10, 12, 11, 13
 
-ALIGN 32
 const interp4_vpp_shuf1, dd 0, 1, 1, 2, 2, 3, 3, 4
                          dd 2, 3, 3, 4, 4, 5, 5, 6
 
-ALIGN 32
 const pb_8tap_hps_0, times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
                      times 2 db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10
                      times 2 db 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12
                      times 2 db 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14
 
-ALIGN 32
-tab_Lm:    db 0, 1, 2, 3, 4,  5,  6,  7,  1, 2, 3, 4,  5,  6,  7,  8
-           db 2, 3, 4, 5, 6,  7,  8,  9,  3, 4, 5, 6,  7,  8,  9,  10
-           db 4, 5, 6, 7, 8,  9,  10, 11, 5, 6, 7, 8,  9,  10, 11, 12
-           db 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14
-
-tab_Vm:    db 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
-           db 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3
-
-tab_Cm:    db 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3
-
-tab_c_526336:   times 4 dd 8192*64+2048
-
-pd_526336:      times 8 dd 8192*64+2048
-
-tab_ChromaCoeff: db  0, 64,  0,  0
-                 db -2, 58, 10, -2
-                 db -4, 54, 16, -2
-                 db -6, 46, 28, -4
-                 db -4, 36, 36, -4
-                 db -4, 28, 46, -6
-                 db -2, 16, 54, -4
-                 db -2, 10, 58, -2
-
-tabw_ChromaCoeff: dw  0, 64,  0,  0
-                  dw -2, 58, 10, -2
-                  dw -4, 54, 16, -2
-                  dw -6, 46, 28, -4
-                  dw -4, 36, 36, -4
-                  dw -4, 28, 46, -6
-                  dw -2, 16, 54, -4
-                  dw -2, 10, 58, -2
-
-ALIGN 32
-tab_ChromaCoeff_V: times 8 db 0, 64
-                   times 8 db 0,  0
-
-                   times 8 db -2, 58
-                   times 8 db 10, -2
-
-                   times 8 db -4, 54
-                   times 8 db 16, -2
-
-                   times 8 db -6, 46
-                   times 8 db 28, -4
-
-                   times 8 db -4, 36
-                   times 8 db 36, -4
-
-                   times 8 db -4, 28
-                   times 8 db 46, -6
-
-                   times 8 db -2, 16
-                   times 8 db 54, -4
-
-                   times 8 db -2, 10
-                   times 8 db 58, -2
-
-tab_ChromaCoeffV: times 4 dw 0, 64
-                  times 4 dw 0, 0
-
-                  times 4 dw -2, 58
-                  times 4 dw 10, -2
-
-                  times 4 dw -4, 54
-                  times 4 dw 16, -2
-
-                  times 4 dw -6, 46
-                  times 4 dw 28, -4
-
-                  times 4 dw -4, 36
-                  times 4 dw 36, -4
-
-                  times 4 dw -4, 28
-                  times 4 dw 46, -6
-
-                  times 4 dw -2, 16
-                  times 4 dw 54, -4
-
-                  times 4 dw -2, 10
-                  times 4 dw 58, -2
-
-ALIGN 32
-pw_ChromaCoeffV:  times 8 dw 0, 64
-                  times 8 dw 0, 0
-
-                  times 8 dw -2, 58
-                  times 8 dw 10, -2
-
-                  times 8 dw -4, 54
-                  times 8 dw 16, -2
-
-                  times 8 dw -6, 46
-                  times 8 dw 28, -4
-
-                  times 8 dw -4, 36
-                  times 8 dw 36, -4
-
-                  times 8 dw -4, 28
-                  times 8 dw 46, -6
-
-                  times 8 dw -2, 16
-                  times 8 dw 54, -4
-
-                  times 8 dw -2, 10
-                  times 8 dw 58, -2
-
-tab_LumaCoeff:   db   0, 0,  0,  64,  0,   0,  0,  0
-                 db  -1, 4, -10, 58,  17, -5,  1,  0
-                 db  -1, 4, -11, 40,  40, -11, 4, -1
-                 db   0, 1, -5,  17,  58, -10, 4, -1
-
-tab_LumaCoeffV: times 4 dw 0, 0
-                times 4 dw 0, 64
-                times 4 dw 0, 0
-                times 4 dw 0, 0
-
-                times 4 dw -1, 4
-                times 4 dw -10, 58
-                times 4 dw 17, -5
-                times 4 dw 1, 0
-
-                times 4 dw -1, 4
-                times 4 dw -11, 40
-                times 4 dw 40, -11
-                times 4 dw 4, -1
-
-                times 4 dw 0, 1
-                times 4 dw -5, 17
-                times 4 dw 58, -10
-                times 4 dw 4, -1
-
-ALIGN 32
-pw_LumaCoeffVer: times 8 dw 0, 0
-                 times 8 dw 0, 64
-                 times 8 dw 0, 0
-                 times 8 dw 0, 0
-
-                 times 8 dw -1, 4
-                 times 8 dw -10, 58
-                 times 8 dw 17, -5
-                 times 8 dw 1, 0
-
-                 times 8 dw -1, 4
-                 times 8 dw -11, 40
-                 times 8 dw 40, -11
-                 times 8 dw 4, -1
-
-                 times 8 dw 0, 1
-                 times 8 dw -5, 17
-                 times 8 dw 58, -10
-                 times 8 dw 4, -1
-
-pb_LumaCoeffVer: times 16 db 0, 0
-                 times 16 db 0, 64
-                 times 16 db 0, 0
-                 times 16 db 0, 0
-
-                 times 16 db -1, 4
-                 times 16 db -10, 58
-                 times 16 db 17, -5
-                 times 16 db 1, 0
-
-                 times 16 db -1, 4
-                 times 16 db -11, 40
-                 times 16 db 40, -11
-                 times 16 db 4, -1
-
-                 times 16 db 0, 1
-                 times 16 db -5, 17
-                 times 16 db 58, -10
-                 times 16 db 4, -1
-
-tab_LumaCoeffVer: times 8 db 0, 0
-                  times 8 db 0, 64
-                  times 8 db 0, 0
-                  times 8 db 0, 0
-
-                  times 8 db -1, 4
-                  times 8 db -10, 58
-                  times 8 db 17, -5
-                  times 8 db 1, 0
-
-                  times 8 db -1, 4
-                  times 8 db -11, 40
-                  times 8 db 40, -11
-                  times 8 db 4, -1
-
-                  times 8 db 0, 1
-                  times 8 db -5, 17
-                  times 8 db 58, -10
-                  times 8 db 4, -1
-
-ALIGN 32
-tab_LumaCoeffVer_32: times 16 db 0, 0
-                     times 16 db 0, 64
-                     times 16 db 0, 0
-                     times 16 db 0, 0
-
-                     times 16 db -1, 4
-                     times 16 db -10, 58
-                     times 16 db 17, -5
-                     times 16 db 1, 0
-
-                     times 16 db -1, 4
-                     times 16 db -11, 40
-                     times 16 db 40, -11
-                     times 16 db 4, -1
-
-                     times 16 db 0, 1
-                     times 16 db -5, 17
-                     times 16 db 58, -10
-                     times 16 db 4, -1
-
-ALIGN 32
-tab_ChromaCoeffVer_32: times 16 db 0, 64
-                       times 16 db 0, 0
-
-                       times 16 db -2, 58
-                       times 16 db 10, -2
-
-                       times 16 db -4, 54
-                       times 16 db 16, -2
-
-                       times 16 db -6, 46
-                       times 16 db 28, -4
-
-                       times 16 db -4, 36
-                       times 16 db 36, -4
-
-                       times 16 db -4, 28
-                       times 16 db 46, -6
-
-                       times 16 db -2, 16


More information about the x265-commits mailing list