[x265-commits] [x265] pixel: fix 16bpp warnings that were previously hidden by ...

Mon Dec 2 23:53:19 CET 2013

details:   http://hg.videolan.org/x265/rev/0a85121531fc
branches:  
changeset: 5415:0a85121531fc
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 01:04:33 2013 -0600
description:
pixel: fix 16bpp warnings that were previously hidden by cmake rules
Subject: [x265] asm: removed unused code from pixel-a.asm

details:   http://hg.videolan.org/x265/rev/df0b4f81609e
branches:  
changeset: 5416:df0b4f81609e
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Mon Dec 02 12:19:34 2013 +0530
description:
asm: removed unused code from pixel-a.asm
Subject: [x265] slicetype: fix for gcc warnings

details:   http://hg.videolan.org/x265/rev/0a8023666206
branches:  
changeset: 5417:0a8023666206
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Mon Dec 02 12:53:59 2013 +0530
description:
slicetype: fix for gcc warnings
Subject: [x265] fix for the number of weighted references exceeding 8 in HM weight analysis

details:   http://hg.videolan.org/x265/rev/bf778de26451
branches:  stable
changeset: 5418:bf778de26451
user:      Shazeb Nawaz Khan <shazeb at multicorewareinc.com>
date:      Mon Dec 02 12:51:57 2013 +0530
description:
fix for the number of weighted references exceeding 8 in HM weight analysis
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/d8d716eb11b8
branches:  
changeset: 5419:d8d716eb11b8
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 01:39:44 2013 -0600
description:
Merge with stable
Subject: [x265] cmake: fix Win64 vector primitive compile flags

details:   http://hg.videolan.org/x265/rev/ccf65888fc2c
branches:  
changeset: 5420:ccf65888fc2c
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 12:34:12 2013 -0600
description:
cmake: fix Win64 vector primitive compile flags
Subject: [x265] picel: fix compile error from older gcc

details:   http://hg.videolan.org/x265/rev/4508b8c923e6
branches:  
changeset: 5421:4508b8c923e6
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:18:37 2013 -0600
description:
picel: fix compile error from older gcc
Subject: [x265] cleanup: removed unused code from sad-a.asm

details:   http://hg.videolan.org/x265/rev/a615a46d4631
branches:  
changeset: 5422:a615a46d4631
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Mon Dec 02 15:29:22 2013 +0530
description:
cleanup: removed unused code from sad-a.asm
Subject: [x265] asm: removed unused function defnitions from pixel.h

details:   http://hg.videolan.org/x265/rev/47ddbf9b5866
branches:  
changeset: 5423:47ddbf9b5866
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Mon Dec 02 13:06:09 2013 +0530
description:
asm: removed unused function defnitions from pixel.h
Subject: [x265] rc: fixups for cutree changes

details:   http://hg.videolan.org/x265/rev/dab34fa63c0c
branches:  
changeset: 5424:dab34fa63c0c
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 13:06:59 2013 -0600
description:
rc: fixups for cutree changes
Subject: [x265] asm: move cvt* functions to blockcopy8.asm

details:   http://hg.videolan.org/x265/rev/b6766dc86e2a
branches:  
changeset: 5425:b6766dc86e2a
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:37:53 2013 -0600
description:
asm: move cvt* functions to blockcopy8.asm
Subject: [x265] asm: remove more unused funcdefs from pixel.h

details:   http://hg.videolan.org/x265/rev/41c6dc5b35e8
branches:  
changeset: 5426:41c6dc5b35e8
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:38:12 2013 -0600
description:
asm: remove more unused funcdefs from pixel.h
Subject: [x265] asm: move transpose from pixel-a.asm to pixel-util8.asm, add pixel-util.h

details:   http://hg.videolan.org/x265/rev/a182faf23ead
branches:  
changeset: 5427:a182faf23ead
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:47:13 2013 -0600
description:
asm: move transpose from pixel-a.asm to pixel-util8.asm, add pixel-util.h
Subject: [x265] asm: move SSIM functions to pixel-util

details:   http://hg.videolan.org/x265/rev/b091438d1446
branches:  
changeset: 5428:b091438d1446
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:51:29 2013 -0600
description:
asm: move SSIM functions to pixel-util
Subject: [x265] asm: move scale functions to pixel-util

details:   http://hg.videolan.org/x265/rev/a439c19ee304
branches:  
changeset: 5429:a439c19ee304
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:54:13 2013 -0600
description:
asm: move scale functions to pixel-util
Subject: [x265] pixel: remove an unused macro

details:   http://hg.videolan.org/x265/rev/2ed3b664c370
branches:  
changeset: 5430:2ed3b664c370
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 14:59:22 2013 -0600
description:
pixel: remove an unused macro
Subject: [x265] asm: move pixel_sub to pixel-util8.asm, move pixel_avg funcdef to mc.h

details:   http://hg.videolan.org/x265/rev/2de04bb5da1d
branches:  
changeset: 5431:2de04bb5da1d
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 15:07:26 2013 -0600
description:
asm: move pixel_sub to pixel-util8.asm, move pixel_avg funcdef to mc.h
Subject: [x265] asm: move variance functions to pixel-util8.asm

details:   http://hg.videolan.org/x265/rev/eea094a84b9c
branches:  
changeset: 5432:eea094a84b9c
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 15:14:09 2013 -0600
description:
asm: move variance functions to pixel-util8.asm
Subject: [x265] asm: move ssd functions into their own ssd-a.asm file, similar to sad-a.asm

details:   http://hg.videolan.org/x265/rev/a9f629fac91e
branches:  
changeset: 5433:a9f629fac91e
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 15:26:14 2013 -0600
description:
asm: move ssd functions into their own ssd-a.asm file, similar to sad-a.asm
Subject: [x265] asm: make it more clear that pixel-a.asm has only satd and sa8d now

details:   http://hg.videolan.org/x265/rev/70e127d735a5
branches:  
changeset: 5434:70e127d735a5
user:      Steve Borho <steve at borho.org>
date:      Mon Dec 02 15:37:57 2013 -0600
description:
asm: make it more clear that pixel-a.asm has only satd and sa8d now

diffstat:

 source/Lib/TLibEncoder/WeightPredAnalysis.cpp |     7 +-
 source/common/CMakeLists.txt                  |     8 +-
 source/common/pixel.cpp                       |    20 +-
 source/common/x86/asm-primitives.cpp          |     1 +
 source/common/x86/blockcopy8.asm              |   114 +
 source/common/x86/mc.h                        |    32 +
 source/common/x86/pixel-a.asm                 |  5178 +------------------------
 source/common/x86/pixel-util.h                |   143 +
 source/common/x86/pixel-util8.asm             |  2344 ++++++++++-
 source/common/x86/pixel.h                     |   316 +-
 source/common/x86/sad-a.asm                   |   496 --
 source/common/x86/ssd-a.asm                   |  2177 ++++++++++
 source/encoder/frameencoder.cpp               |     9 +-
 source/encoder/ratecontrol.cpp                |     8 -
 source/encoder/slicetype.cpp                  |    78 +-
 source/encoder/slicetype.h                    |     6 +-
 16 files changed, 4803 insertions(+), 6134 deletions(-)

diffs (truncated from 11310 to 300 lines):

diff -r c75c3431b108 -r 70e127d735a5 source/Lib/TLibEncoder/WeightPredAnalysis.cpp

--- a/source/Lib/TLibEncoder/WeightPredAnalysis.cpp	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/Lib/TLibEncoder/WeightPredAnalysis.cpp	Mon Dec 02 15:37:57 2013 -0600
@@ -281,6 +281,7 @@ bool WeightPredAnalysis::xSelectWP(TComS
     int height = pic->getHeight();
     int defaultWeight = ((int)1 << denom);
     int numPredDir = slice->isInterP() ? 1 : 2;
+    int numWeighted = 0;
 
     for (int list = 0; list < numPredDir; list++)
     {
@@ -313,7 +314,7 @@ bool WeightPredAnalysis::xSelectWP(TComS
             SADnoWP += this->xCalcSADvalueWP(X265_DEPTH, fenc, fref, width >> 1, height >> 1, orgStride, refStride, denom, defaultWeight, 0);
 
             double dRatio = ((double)SADWP / (double)SADnoWP);
-            if (dRatio >= (double)DTHRESH)
+            if (dRatio >= (double)DTHRESH || numWeighted >= 8)
             {
                 for (int comp = 0; comp < 3; comp++)
                 {
@@ -323,6 +324,10 @@ bool WeightPredAnalysis::xSelectWP(TComS
                     weightPredTable[list][refIdxTmp][comp].log2WeightDenom = (int)denom;
                 }
             }
+            else
+            {
+                numWeighted++;
+            }
         }
     }
 
diff -r c75c3431b108 -r 70e127d735a5 source/common/CMakeLists.txt
--- a/source/common/CMakeLists.txt	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/CMakeLists.txt	Mon Dec 02 15:37:57 2013 -0600
@@ -91,10 +91,10 @@ if(ENABLE_PRIMITIVES_VEC)
             add_definitions(/Qwd280) # conditional expression is constant
         endif()
         if (X64)
+            set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE}")
+        else()
             # x64 implies SSE4, so this flag would have no effect (and it issues a warning)
             set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE} /arch:SSE2")
-        else()
-            set_source_files_properties(${SSE3} ${SSSE3} ${SSE41} PROPERTIES COMPILE_FLAGS "${WARNDISABLE}")
         endif()
     endif()
     if(GCC)
@@ -119,8 +119,8 @@ endif(ENABLE_PRIMITIVES_VEC)
 
 if(ENABLE_PRIMITIVES_ASM)
     set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h)
-    set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm mc-a.asm mc-a2.asm
-               ipfilter8.asm pixel-util8.asm blockcopy8.asm intrapred8.asm
+    set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm sad-a.asm ssd-a.asm mc-a.asm
+               mc-a2.asm ipfilter8.asm pixel-util8.asm blockcopy8.asm intrapred8.asm
                pixeladd8.asm dct8.asm)
     if (NOT X64)
         set(A_SRCS ${A_SRCS} pixel-32.asm)
diff -r c75c3431b108 -r 70e127d735a5 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/pixel.cpp	Mon Dec 02 15:37:57 2013 -0600
@@ -661,12 +661,12 @@ float ssim_end_1(int s1, int s2, int ss,
     static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5);
     static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5);
 #endif
-    type fs1 = s1;
-    type fs2 = s2;
-    type fss = ss;
-    type fs12 = s12;
-    type vars = fss * 64 - fs1 * fs1 - fs2 * fs2;
-    type covar = fs12 * 64 - fs1 * fs2;
+    type fs1 = (type)s1;
+    type fs2 = (type)s2;
+    type fss = (type)ss;
+    type fs12 = (type)s12;
+    type vars = (type)(fss * 64 - fs1 * fs1 - fs2 * fs2);
+    type covar = (type)(fs12 * 64 - fs1 * fs2);
     return (float)(2 * fs1 * fs2 + ssim_c1) * (float)(2 * covar + ssim_c2)
            / ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars + ssim_c2));
 #undef type
@@ -901,16 +901,10 @@ void Setup_C_PixelPrimitives(EncoderPrim
     LUMA(16, 64);
     CHROMA(8, 32);
 
-    //sse
-#if HIGH_BIT_DEPTH
-    SET_FUNC_PRIMITIVE_TABLE_C(sse_pp, sse, pixelcmp_t, int16_t, int16_t)
-    SET_FUNC_PRIMITIVE_TABLE_C(sse_sp, sse, pixelcmp_sp_t, int16_t, int16_t)
-    SET_FUNC_PRIMITIVE_TABLE_C(sse_ss, sse, pixelcmp_ss_t, int16_t, int16_t)
-#else
     SET_FUNC_PRIMITIVE_TABLE_C(sse_pp, sse, pixelcmp_t, pixel, pixel)
     SET_FUNC_PRIMITIVE_TABLE_C(sse_sp, sse, pixelcmp_sp_t, int16_t, pixel)
     SET_FUNC_PRIMITIVE_TABLE_C(sse_ss, sse, pixelcmp_ss_t, int16_t, int16_t)
-#endif
+
     p.blockcpy_pp = blockcopy_p_p;
     p.blockcpy_ps = blockcopy_p_s;
 
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/asm-primitives.cpp	Mon Dec 02 15:37:57 2013 -0600
@@ -29,6 +29,7 @@
 
 extern "C" {
 #include "pixel.h"
+#include "pixel-util.h"
 #include "mc.h"
 #include "ipfilter8.h"
 #include "blockcopy8.h"
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/blockcopy8.asm
--- a/source/common/x86/blockcopy8.asm	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/blockcopy8.asm	Mon Dec 02 15:37:57 2013 -0600
@@ -2360,3 +2360,117 @@ BLOCKCOPY_PS_W64_H2 64, 16
 BLOCKCOPY_PS_W64_H2 64, 32
 BLOCKCOPY_PS_W64_H2 64, 48
 BLOCKCOPY_PS_W64_H2 64, 64
+
+;-----------------------------------------------------------------------------
+; void cvt32to16_shr(short *dst, int *src, intptr_t stride, int shift, int size)
+;-----------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal cvt32to16_shr, 5, 7, 1, dst, src, stride
+%define rnd     m7
+%define shift   m6
+
+    ; make shift
+    mov         r5d, r3m
+    movd        shift, r5d
+
+    ; make round
+    dec         r5
+    xor         r6, r6
+    bts         r6, r5
+    
+    movd        rnd, r6d
+    pshufd      rnd, rnd, 0
+
+    ; register alloc
+    ; r0 - dst
+    ; r1 - src
+    ; r2 - stride * 2 (short*)
+    ; r3 - lx
+    ; r4 - size
+    ; r5 - ly
+    ; r6 - diff
+    lea         r2, [r2 * 2]
+
+    mov         r4d, r4m
+    mov         r5, r4
+    mov         r6, r2
+    sub         r6, r4
+    lea         r6, [r6 * 2]
+
+    shr         r5, 1
+.loop_row:
+
+    mov         r3, r4
+    shr         r3, 2
+.loop_col:
+    ; row 0
+    movu        m0, [r1]
+    paddd       m0, rnd
+    psrad       m0, shift
+    packssdw    m0, m0
+    movh        [r0], m0
+
+    ; row 1
+    movu        m0, [r1 + r4 * 4]
+    paddd       m0, rnd
+    psrad       m0, shift
+    packssdw    m0, m0
+    movh        [r0 + r2], m0
+
+    ; move col pointer
+    add         r1, 16
+    add         r0, 8
+
+    dec         r3
+    jg          .loop_col
+
+    ; update pointer
+    lea         r1, [r1 + r4 * 4]
+    add         r0, r6
+
+    ; end of loop_row
+    dec         r5
+    jg         .loop_row
+    
+    RET
+
+
+;--------------------------------------------------------------------------------------
+; void cvt16to32_shl(int32_t *dst, int16_t *src, intptr_t stride, int shift, int size);
+;--------------------------------------------------------------------------------------
+INIT_XMM sse2
+cglobal cvt16to32_shl, 5, 7, 2, dst, src, stride, shift, size
+%define shift       m6
+
+    ; make shift
+    mov             r5d,      r3m
+    movd            shift,    r5d
+
+    ; register alloc
+    ; r0 - dst
+    ; r1 - src
+    ; r2 - stride
+    ; r3 - shift
+    ; r4 - size
+
+    mov             r5d,      r4d
+    shr             r4d,      2
+.loop_row
+    mov             r6d,      r4d
+
+.loop_col
+    pmovsxwd        m0,       [r1]
+    pslld           m0,       shift
+    movu            [r0],     m0
+
+    add             r1,       8
+    add             r0,       16
+
+    dec             r6d
+    jnz             .loop_col
+
+    dec             r5d
+    jnz             .loop_row
+
+    RET
+
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/mc.h
--- a/source/common/x86/mc.h	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/mc.h	Mon Dec 02 15:37:57 2013 -0600
@@ -33,4 +33,36 @@ LOWRES(ssse3)
 LOWRES(avx)
 LOWRES(xop)
 
+#define DECL_SUF(func, args) \
+    void func ## _mmx2 args; \
+    void func ## _sse2 args; \
+    void func ## _ssse3 args;
+DECL_SUF(x265_pixel_avg_64x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_64x48, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_64x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_48x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x24, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_32x8,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_24x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x64, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x32, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x12, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x8,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_16x4,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_12x16, (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x32,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x16,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x8,   (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_8x4,   (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x16,  (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x8,   (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+DECL_SUF(x265_pixel_avg_4x4,   (pixel *, intptr_t, pixel *, intptr_t, pixel *, intptr_t, int))
+
+#undef LOWRES
+#undef DECL_SUF
+
 #endif // ifndef X265_MC_H
diff -r c75c3431b108 -r 70e127d735a5 source/common/x86/pixel-a.asm
--- a/source/common/x86/pixel-a.asm	Mon Dec 02 11:48:10 2013 +0530
+++ b/source/common/x86/pixel-a.asm	Mon Dec 02 15:37:57 2013 -0600
@@ -38,24 +38,9 @@ hmul_8p:   times 8 db 1
            times 4 db 1, -1
            times 8 db 1
            times 4 db 1, -1
-mask_ff:   times 16 db 0xff
-           times 16 db 0
-%if BIT_DEPTH == 10
-ssim_c1:   times 4 dd 6697.7856    ; .01*.01*1023*1023*64
-ssim_c2:   times 4 dd 3797644.4352 ; .03*.03*1023*1023*64*63
-pf_64:     times 4 dd 64.0
-pf_128:    times 4 dd 128.0
-%elif BIT_DEPTH == 9
-ssim_c1:   times 4 dd 1671         ; .01*.01*511*511*64
-ssim_c2:   times 4 dd 947556       ; .03*.03*511*511*64*63
-%else ; 8-bit
-ssim_c1:   times 4 dd 416          ; .01*.01*255*255*64
-ssim_c2:   times 4 dd 235963       ; .03*.03*255*255*64*63
-%endif
 hmul_4p:   times 2 db 1, 1, 1, 1, 1, -1, 1, -1
 mask_10:   times 4 dw 0, -1
 mask_1100: times 2 dd 0, -1
-deinterleave_shuf: db 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15
 
 ALIGN 32
 transd_shuf1: SHUFFLE_MASK_W 0, 8, 2, 10, 4, 12, 6, 14
@@ -66,25 +51,6 @@ pd_f0:     times 4 dd 0xffff0000