[x265-commits] [x265] constants: remove init/destroyROM functions

Thu Nov 20 07:28:28 CET 2014

details:   http://hg.videolan.org/x265/rev/d3389bb9efd0
branches:  
changeset: 8855:d3389bb9efd0
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 18 19:50:29 2014 -0600
description:
constants: remove init/destroyROM functions
Subject: [x265] threading: use 32bit atomic integer operations exclusively

details:   http://hg.videolan.org/x265/rev/814b687db30e
branches:  
changeset: 8856:814b687db30e
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 18 20:16:57 2014 -0600
description:
threading: use 32bit atomic integer operations exclusively

The 32bit operations have better portability and have less onerous alignment
restrictions.
Subject: [x265] wavefront: fix msvc warning

details:   http://hg.videolan.org/x265/rev/e29c618cd9a7
branches:  
changeset: 8857:e29c618cd9a7
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 18 21:25:08 2014 -0600
description:
wavefront: fix msvc warning

warning C4800: 'unsigned long' : forcing value to bool 'true' or 'false' (performance warning)
Subject: [x265] threadind: fixes for VC11 Win32 includes, prune two unused functions

details:   http://hg.videolan.org/x265/rev/2b830f08d948
branches:  
changeset: 8858:2b830f08d948
user:      Steve Borho <steve at borho.org>
date:      Wed Nov 19 01:28:50 2014 -0600
description:
threadind: fixes for VC11 Win32 includes, prune two unused functions
Subject: [x265] fseeko for mingw32

details:   http://hg.videolan.org/x265/rev/cb9bb697fcaa
branches:  
changeset: 8859:cb9bb697fcaa
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Wed Nov 19 15:39:25 2014 +0900
description:
fseeko for mingw32
Subject: [x265] refactorizaton of the transform/quant path.

details:   http://hg.videolan.org/x265/rev/8bee552a1964
branches:  
changeset: 8860:8bee552a1964
user:      Praveen Tiwari
date:      Tue Nov 18 14:00:27 2014 +0530
description:
refactorizaton of the transform/quant path.

This patch involves scaling down the DCT/IDCT coefficients from int32_t to
int16_t as they can be accommodated on int16_t without any introduction of
encode error, this allows us to clean up lots of DCT/IDCT intermediate
buffers, optimize enode efficiency for different cli options including noise
reduction by reducing data movement operations, accommodating more number of
coefficients in a single register for SIMD operations. This patch include all
necessary changes for the transfor/quant path including unit test code.
Subject: [x265] dct: fix gcc warnings

details:   http://hg.videolan.org/x265/rev/34cb58c53859
branches:  
changeset: 8861:34cb58c53859
user:      Steve Borho <steve at borho.org>
date:      Tue Nov 18 12:06:19 2014 -0600
description:
dct: fix gcc warnings
Subject: [x265] primitives: clarify constness

details:   http://hg.videolan.org/x265/rev/99b5cebf8193
branches:  
changeset: 8862:99b5cebf8193
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Sun Nov 16 14:32:17 2014 +0900
description:
primitives: clarify constness
Subject: [x265] disable denoiseDct asm code until fixed for Mac OS

details:   http://hg.videolan.org/x265/rev/f236adb703f5
branches:  
changeset: 8863:f236adb703f5
user:      Praveen Tiwari
date:      Wed Nov 19 18:42:24 2014 +0530
description:
disable denoiseDct asm code until fixed for Mac OS
Subject: [x265] replace char to int8_t, where it should be signed char

details:   http://hg.videolan.org/x265/rev/14a8bb7bbcab
branches:  
changeset: 8864:14a8bb7bbcab
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Thu Nov 20 11:30:33 2014 +0900
description:
replace char to int8_t, where it should be signed char
Subject: [x265] fix for rd=0

details:   http://hg.videolan.org/x265/rev/b33cbe130c63
branches:  
changeset: 8865:b33cbe130c63
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Thu Nov 20 14:25:01 2014 +0900
description:
fix for rd=0
Subject: [x265] encoder: fix analysis file read

details:   http://hg.videolan.org/x265/rev/0c25a6eac0ca
branches:  
changeset: 8866:0c25a6eac0ca
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Thu Nov 20 11:43:37 2014 +0530
description:
encoder: fix analysis file read
Subject: [x265] luma_hpp[4x4]: AVX2 asm code bug fix

details:   http://hg.videolan.org/x265/rev/4b637cb9b792
branches:  
changeset: 8867:4b637cb9b792
user:      Praveen Tiwari
date:      Thu Nov 20 11:49:38 2014 +0530
description:
luma_hpp[4x4]: AVX2 asm code bug fix

diffstat:

 source/common/common.h               |    4 +
 source/common/constants.cpp          |   16 -
 source/common/constants.h            |    3 -
 source/common/cudata.cpp             |   16 +-
 source/common/cudata.h               |   14 +-
 source/common/dct.cpp                |  223 ++---------
 source/common/ipfilter.cpp           |   32 +-
 source/common/param.cpp              |    2 +-
 source/common/picyuv.h               |    9 +
 source/common/pixel.cpp              |  129 ++---
 source/common/predict.cpp            |   90 ++--
 source/common/primitives.cpp         |    2 -
 source/common/primitives.h           |  113 ++---
 source/common/quant.cpp              |   36 +-
 source/common/quant.h                |   10 +-
 source/common/shortyuv.cpp           |   12 +-
 source/common/threading.h            |   58 +--
 source/common/threadpool.cpp         |   32 +-
 source/common/vec/dct-sse3.cpp       |  250 ++----------
 source/common/vec/dct-sse41.cpp      |   14 +-
 source/common/vec/dct-ssse3.cpp      |   34 +-
 source/common/wavefront.cpp          |   72 +--
 source/common/wavefront.h            |    8 +-
 source/common/winxp.h                |   24 -
 source/common/x86/asm-primitives.cpp |   15 +-
 source/common/x86/blockcopy8.asm     |   86 +----
 source/common/x86/blockcopy8.h       |  113 ++---
 source/common/x86/dct8.asm           |  661 +++++++++++++++++++---------------
 source/common/x86/dct8.h             |   32 +-
 source/common/x86/ipfilter8.asm      |    4 +-
 source/common/x86/ipfilter8.h        |   46 +-
 source/common/x86/mc.h               |   52 +-
 source/common/x86/pixel-util.h       |   78 ++--
 source/common/x86/pixel-util8.asm    |   49 +-
 source/common/x86/pixel.h            |   82 ++--
 source/common/yuv.cpp                |   38 +-
 source/encoder/analysis.cpp          |   51 +-
 source/encoder/api.cpp               |    1 -
 source/encoder/encoder.cpp           |   15 +-
 source/encoder/entropy.cpp           |    6 +-
 source/encoder/frameencoder.cpp      |    2 +-
 source/encoder/rdcost.h              |    4 +-
 source/encoder/search.cpp            |   95 ++--
 source/encoder/slicetype.cpp         |    2 +-
 source/test/intrapredharness.cpp     |    2 -
 source/test/mbdstharness.cpp         |   65 +-
 source/test/mbdstharness.h           |    4 +-
 source/test/pixelharness.cpp         |   59 +--
 source/test/pixelharness.h           |    3 +-
 49 files changed, 1153 insertions(+), 1615 deletions(-)

diffs (truncated from 5807 to 300 lines):

diff -r d059cfa88f1a -r 4b637cb9b792 source/common/common.h

--- a/source/common/common.h	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/common.h	Thu Nov 20 11:49:38 2014 +0530
@@ -56,6 +56,10 @@ extern "C" intptr_t x265_stack_align(voi
 #define x265_stack_align(func, ...) func(__VA_ARGS__)
 #endif
 
+#if defined(__MINGW32__)
+#define fseeko fseeko64
+#endif
+
 #elif defined(_MSC_VER)
 
 #define ALIGN_VAR_8(T, var)  __declspec(align(8)) T var
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/constants.cpp
--- a/source/common/constants.cpp	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/constants.cpp	Thu Nov 20 11:49:38 2014 +0530
@@ -27,22 +27,6 @@
 
 namespace x265 {
 
-static int initialized /* = 0 */;
-
-// initialize ROM variables
-void initROM()
-{
-    if (ATOMIC_CAS32(&initialized, 0, 1) == 1)
-        return;
-}
-
-void destroyROM()
-{
-    if (ATOMIC_CAS32(&initialized, 1, 0) == 0)
-        return;
-}
-
-
 // lambda = pow(2, (double)q / 6 - 2);
 double x265_lambda_tab[QP_MAX_MAX + 1] =
 {
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/constants.h
--- a/source/common/constants.h	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/constants.h	Thu Nov 20 11:49:38 2014 +0530
@@ -29,9 +29,6 @@
 namespace x265 {
 // private namespace
 
-void initROM();
-void destroyROM();
-
 void initZscanToRaster(uint32_t maxFullDepth, uint32_t depth, uint32_t startVal, uint32_t*& curIdx);
 void initRasterToZscan(uint32_t maxFullDepth);
 
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/cudata.cpp
--- a/source/common/cudata.cpp	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/cudata.cpp	Thu Nov 20 11:49:38 2014 +0530
@@ -227,12 +227,12 @@ void CUData::initialize(const CUDataMemP
     /* Each CU's data is layed out sequentially within the charMemBlock */
     uint8_t *charBuf = dataPool.charMemBlock + (m_numPartitions * BytesPerPartition) * instance;
 
-    m_qp          = (char*)charBuf; charBuf += m_numPartitions;
+    m_qp        = (int8_t*)charBuf; charBuf += m_numPartitions;
     m_log2CUSize         = charBuf; charBuf += m_numPartitions;
     m_lumaIntraDir       = charBuf; charBuf += m_numPartitions;
     m_tqBypass           = charBuf; charBuf += m_numPartitions;
-    m_refIdx[0]   = (char*)charBuf; charBuf += m_numPartitions;
-    m_refIdx[1]   = (char*)charBuf; charBuf += m_numPartitions;
+    m_refIdx[0] = (int8_t*)charBuf; charBuf += m_numPartitions;
+    m_refIdx[1] = (int8_t*)charBuf; charBuf += m_numPartitions;
     m_cuDepth            = charBuf; charBuf += m_numPartitions;
     m_predMode           = charBuf; charBuf += m_numPartitions; /* the order up to here is important in initCTU() and initSubCU() */
     m_partSize           = charBuf; charBuf += m_numPartitions;
@@ -772,7 +772,7 @@ const CUData* CUData::getQpMinCuAbove(ui
 }
 
 /* Get reference QP from left QpMinCu or latest coded QP */
-char CUData::getRefQP(uint32_t curAbsIdxInCTU) const
+int8_t CUData::getRefQP(uint32_t curAbsIdxInCTU) const
 {
     uint32_t lPartIdx = 0, aPartIdx = 0;
     const CUData* cULeft = getQpMinCuLeft(lPartIdx, m_absIdxInCTU + curAbsIdxInCTU);
@@ -794,7 +794,7 @@ int CUData::getLastValidPartIdx(int absP
     return lastValidPartIdx;
 }
 
-char CUData::getLastCodedQP(uint32_t absPartIdx) const
+int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
 {
     uint32_t quPartIdxMask = 0xFF << (g_maxFullDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
     int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
@@ -808,7 +808,7 @@ char CUData::getLastCodedQP(uint32_t abs
         else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
             return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_CU_PARTITIONS);
         else
-            return (char)m_slice->m_sliceQp;
+            return (int8_t)m_slice->m_sliceQp;
     }
 }
 
@@ -936,7 +936,7 @@ uint32_t CUData::getCtxSkipFlag(uint32_t
     return ctx;
 }
 
-bool CUData::setQPSubCUs(char qp, uint32_t absPartIdx, uint32_t depth)
+bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
 {
     uint32_t curPartNumb = NUM_CU_PARTITIONS >> (depth << 1);
     uint32_t curPartNumQ = curPartNumb >> 2;
@@ -1211,7 +1211,7 @@ void CUData::setPUMv(int list, const MV&
     setAllPU(m_mv[list], mv, absPartIdx, puIdx);
 }
 
-void CUData::setPURefIdx(int list, char refIdx, int absPartIdx, int puIdx)
+void CUData::setPURefIdx(int list, int8_t refIdx, int absPartIdx, int puIdx)
 {
     setAllPU(m_refIdx[list], refIdx, absPartIdx, puIdx);
 }
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/cudata.h
--- a/source/common/cudata.h	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/cudata.h	Thu Nov 20 11:49:38 2014 +0530
@@ -127,11 +127,11 @@ public:
     int           m_vChromaShift;
 
     /* Per-part data, stored contiguously */
-    char*         m_qp;               // array of QP values
+    int8_t*       m_qp;               // array of QP values
     uint8_t*      m_log2CUSize;       // array of cu log2Size TODO: seems redundant to depth
     uint8_t*      m_lumaIntraDir;     // array of intra directions (luma)
     uint8_t*      m_tqBypass;         // array of CU lossless flags
-    char*         m_refIdx[2];        // array of motion reference indices per list
+    int8_t*       m_refIdx[2];        // array of motion reference indices per list
     uint8_t*      m_cuDepth;          // array of depths
     uint8_t*      m_predMode;         // array of prediction modes
     uint8_t*      m_partSize;         // array of partition sizes
@@ -177,7 +177,7 @@ public:
     void     clearCbf()                            { m_partSet(m_cbf[0], 0); m_partSet(m_cbf[1], 0); m_partSet(m_cbf[2], 0); }
 
     /* these functions all take depth as an absolute depth from CTU, it is used to calculate the number of parts to copy */
-    void     setQPSubParts(char qp, uint32_t absPartIdx, uint32_t depth)                      { s_partSet[depth]((uint8_t*)m_qp + absPartIdx, (uint8_t)qp); }
+    void     setQPSubParts(int8_t qp, uint32_t absPartIdx, uint32_t depth)                    { s_partSet[depth]((uint8_t*)m_qp + absPartIdx, (uint8_t)qp); }
     void     setTUDepthSubParts(uint8_t tuDepth, uint32_t absPartIdx, uint32_t depth)         { s_partSet[depth](m_tuDepth + absPartIdx, tuDepth); }
     void     setLumaIntraDirSubParts(uint8_t dir, uint32_t absPartIdx, uint32_t depth)        { s_partSet[depth](m_lumaIntraDir + absPartIdx, dir); }
     void     setChromIntraDirSubParts(uint8_t dir, uint32_t absPartIdx, uint32_t depth)       { s_partSet[depth](m_chromaIntraDir + absPartIdx, dir); }
@@ -186,15 +186,15 @@ public:
     void     setTransformSkipSubParts(uint8_t tskip, TextType ttype, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth](m_transformSkip[ttype] + absPartIdx, tskip); }
     void     setTransformSkipPartRange(uint8_t tskip, TextType ttype, uint32_t absPartIdx, uint32_t coveredPartIdxes) { memset(m_transformSkip[ttype] + absPartIdx, tskip, coveredPartIdxes); }
 
-    bool     setQPSubCUs(char qp, uint32_t absPartIdx, uint32_t depth);
+    bool     setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth);
 
     void     setPUInterDir(uint8_t dir, uint32_t absPartIdx, uint32_t puIdx);
     void     setPUMv(int list, const MV& mv, int absPartIdx, int puIdx);
-    void     setPURefIdx(int list, char refIdx, int absPartIdx, int puIdx);
+    void     setPURefIdx(int list, int8_t refIdx, int absPartIdx, int puIdx);
 
     uint8_t  getCbf(uint32_t absPartIdx, TextType ttype, uint32_t trDepth) const { return (m_cbf[ttype][absPartIdx] >> trDepth) & 0x1; }
     uint8_t  getQtRootCbf(uint32_t absPartIdx) const                             { return m_cbf[0][absPartIdx] || m_cbf[1][absPartIdx] || m_cbf[2][absPartIdx]; }
-    char     getRefQP(uint32_t currAbsIdxInCTU) const;
+    int8_t   getRefQP(uint32_t currAbsIdxInCTU) const;
     uint32_t getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MVField (*mvFieldNeighbours)[2], uint8_t* interDirNeighbours) const;
     void     clipMv(MV& outMV) const;
     int      fillMvpCand(uint32_t puIdx, uint32_t absPartIdx, int picList, int refIdx, MV* amvpCand, MV* mvc) const;
@@ -237,7 +237,7 @@ protected:
     template<typename T>
     void setAllPU(T *p, const T& val, int absPartIdx, int puIdx);
 
-    char getLastCodedQP(uint32_t absPartIdx) const;
+    int8_t getLastCodedQP(uint32_t absPartIdx) const;
     int  getLastValidPartIdx(int absPartIdx) const;
 
     bool hasEqualMotion(uint32_t absPartIdx, const CUData& candCU, uint32_t candAbsPartIdx) const;
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/dct.cpp
--- a/source/common/dct.cpp	Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/dct.cpp	Thu Nov 20 11:49:38 2014 +0530
@@ -41,7 +41,7 @@ namespace {
 
 // Fast DST Algorithm. Full matrix multiplication for DST and Fast DST algorithm
 // give identical results
-void fastForwardDst(int16_t *block, int16_t *coeff, int shift)  // input block, output coeff
+void fastForwardDst(const int16_t* block, int16_t* coeff, int shift)  // input block, output coeff
 {
     int c[4];
     int rnd_factor = 1 << (shift - 1);
@@ -61,7 +61,7 @@ void fastForwardDst(int16_t *block, int1
     }
 }
 
-void inversedst(int16_t *tmp, int16_t *block, int shift)  // input tmp, output block
+void inversedst(const int16_t* tmp, int16_t* block, int shift)  // input tmp, output block
 {
     int i, c[4];
     int rnd_factor = 1 << (shift - 1);
@@ -81,7 +81,7 @@ void inversedst(int16_t *tmp, int16_t *b
     }
 }
 
-void partialButterfly16(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly16(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[8], O[8];
@@ -134,7 +134,7 @@ void partialButterfly16(int16_t *src, in
     }
 }
 
-void partialButterfly32(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly32(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[16], O[16];
@@ -203,7 +203,7 @@ void partialButterfly32(int16_t *src, in
     }
 }
 
-void partialButterfly8(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly8(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[4], O[4];
@@ -240,7 +240,7 @@ void partialButterfly8(int16_t *src, int
     }
 }
 
-void partialButterflyInverse4(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse4(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j;
     int E[2], O[2];
@@ -265,7 +265,7 @@ void partialButterflyInverse4(int16_t *s
     }
 }
 
-void partialButterflyInverse8(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse8(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[4], O[4];
@@ -301,7 +301,7 @@ void partialButterflyInverse8(int16_t *s
     }
 }
 
-void partialButterflyInverse16(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse16(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[8], O[8];
@@ -352,7 +352,7 @@ void partialButterflyInverse16(int16_t *
     }
 }
 
-void partialButterflyInverse32(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse32(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j, k;
     int E[16], O[16];
@@ -416,7 +416,7 @@ void partialButterflyInverse32(int16_t *
     }
 }
 
-void partialButterfly4(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly4(const int16_t* src, int16_t* dst, int shift, int line)
 {
     int j;
     int E[2], O[2];
@@ -440,7 +440,7 @@ void partialButterfly4(int16_t *src, int
     }
 }
 
-void dst4_c(int16_t *src, int32_t *dst, intptr_t stride)
+void dst4_c(const int16_t *src, int16_t *dst, intptr_t stride)
 {
     const int shift_1st = 1 + X265_DEPTH - 8;
     const int shift_2nd = 8;
@@ -454,132 +454,54 @@ void dst4_c(int16_t *src, int32_t *dst, 
     }
 
     fastForwardDst(block, coef, shift_1st);
-    fastForwardDst(coef, block, shift_2nd);
-
-#define N (4)
-    for (int i = 0; i < N; i++)
-    {
-        for (int j = 0; j < N; j++)
-        {
-            dst[i * N + j] = block[i * N + j];
-        }
-    }
-
-#undef N
+    fastForwardDst(coef, dst, shift_2nd);
 }
 
-void dct4_c(int16_t *src, int32_t *dst, intptr_t stride)
+void dct4_c(const int16_t *src, int16_t *dst, intptr_t /* stride */)
 {
     const int shift_1st = 1 + X265_DEPTH - 8;
     const int shift_2nd = 8;
 
     ALIGN_VAR_32(int16_t, coef[4 * 4]);
-    ALIGN_VAR_32(int16_t, block[4 * 4]);