[x265-commits] [x265] Remove redundant settings from performance presets

Fri Feb 14 20:38:44 CET 2014

details:   http://hg.videolan.org/x265/rev/8093e808bfee
branches:  
changeset: 6119:8093e808bfee
user:      Tom Vaughan (tom.vaughan at multicorewareinc.com)
date:      Thu Feb 13 17:07:40 2014 -0800
description:
Remove redundant settings from performance presets
Subject: [x265] asm: cleanups for ipfilter functions to reduce register counts

details:   http://hg.videolan.org/x265/rev/fcfe87ee36b7
branches:  
changeset: 6120:fcfe87ee36b7
user:      Murugan Vairavel <murugan at multicorewareinc.com>
date:      Thu Feb 13 10:50:32 2014 +0530
description:
asm: cleanups for ipfilter functions to reduce register counts
Subject: [x265] const tables

details:   http://hg.videolan.org/x265/rev/2ce38565571e
branches:  
changeset: 6121:2ce38565571e
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Thu Feb 13 16:39:53 2014 +0900
description:
const tables
Subject: [x265] Remove redundant settings from performance presets

details:   http://hg.videolan.org/x265/rev/0265344d0727
branches:  
changeset: 6122:0265344d0727
user:      Tom Vaughan (tom.vaughan at multicorewareinc.com)
date:      Thu Feb 13 17:20:28 2014 -0800
description:
Remove redundant settings from performance presets
Subject: [x265] cmake: add a blacklist of libs to keep from x265.pc Libs.private

details:   http://hg.videolan.org/x265/rev/757b127f8ede
branches:  
changeset: 6123:757b127f8ede
user:      Steve Borho <steve at borho.org>
date:      Thu Feb 13 21:29:08 2014 -0600
description:
cmake: add a blacklist of libs to keep from x265.pc Libs.private
Subject: [x265] cmake: on MSVC, CMAKE_CXX_IMPLICIT_LINK_LIBRARIES and PLATFORM_LIBS may be empty

details:   http://hg.videolan.org/x265/rev/f46c3f816fe7
branches:  
changeset: 6124:f46c3f816fe7
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 14 00:52:01 2014 -0600
description:
cmake: on MSVC, CMAKE_CXX_IMPLICIT_LINK_LIBRARIES and PLATFORM_LIBS may be empty
Subject: [x265] reference: remove unnecessary duplicate variable

details:   http://hg.videolan.org/x265/rev/0d033b5677da
branches:  
changeset: 6125:0d033b5677da
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 14 00:53:22 2014 -0600
description:
reference: remove unnecessary duplicate variable
Subject: [x265] compress: missed few lines of code while applying previous patch

details:   http://hg.videolan.org/x265/rev/11ffc3cfe0d8
branches:  
changeset: 6126:11ffc3cfe0d8
user:      Sumalatha Polureddy
date:      Fri Feb 14 12:39:59 2014 +0530
description:
compress: missed few lines of code while applying previous patch

1. Increase the eraly skips in rd2
2. Sa8d cost is not calculated, but used in the code
Subject: [x265] compress: Bug fix in rd2

details:   http://hg.videolan.org/x265/rev/d90a4adcb492
branches:  
changeset: 6127:d90a4adcb492
user:      Sumalatha Polureddy
date:      Fri Feb 14 13:10:30 2014 +0530
description:
compress: Bug fix in rd2

the sa8d cost in rd2 for inter and intra are different
for inter, totalbits = 0,
for intra, totalbits = cabac bits
for now, making the totalbits = 0 for both inter and intra
Subject: [x265] encoder: report the hash digest from the correct frame encoder

details:   http://hg.videolan.org/x265/rev/d6559298428a
branches:  
changeset: 6128:d6559298428a
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 14 02:44:50 2014 -0600
description:
encoder: report the hash digest from the correct frame encoder
Subject: [x265] encoder: do not generate digest string if we are not going to print it

details:   http://hg.videolan.org/x265/rev/d43e8e0c950d
branches:  
changeset: 6129:d43e8e0c950d
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 14 02:57:24 2014 -0600
description:
encoder: do not generate digest string if we are not going to print it
Subject: [x265] remove unused HM WeightPredAnalysis files

details:   http://hg.videolan.org/x265/rev/ed310b17ff66
branches:  
changeset: 6130:ed310b17ff66
user:      Steve Borho <steve at borho.org>
date:      Fri Feb 14 02:30:52 2014 -0600
description:
remove unused HM WeightPredAnalysis files
Subject: [x265] asm: Clean up and minor modifications in pixel_add_ps 16bpp asm functions(4xN)

details:   http://hg.videolan.org/x265/rev/248b665970e8
branches:  
changeset: 6131:248b665970e8
user:      Nabajit Deka
date:      Fri Feb 14 12:32:37 2014 +0530
description:
asm: Clean up and minor modifications in pixel_add_ps 16bpp asm functions(4xN)
Subject: [x265] square transform only

details:   http://hg.videolan.org/x265/rev/a3a9e0fb1a87
branches:  
changeset: 6132:a3a9e0fb1a87
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Fri Feb 14 14:26:57 2014 +0900
description:
square transform only

diffstat:

 source/CMakeLists.txt                         |   10 +-
 source/Lib/TLibCommon/TComRom.cpp             |   33 +-
 source/Lib/TLibCommon/TComRom.h               |   14 +-
 source/Lib/TLibCommon/TComTrQuant.cpp         |  104 +-
 source/Lib/TLibCommon/TComTrQuant.h           |   14 +-
 source/Lib/TLibEncoder/TEncSbac.cpp           |    4 +-
 source/Lib/TLibEncoder/TEncSearch.cpp         |   56 +-
 source/Lib/TLibEncoder/WeightPredAnalysis.cpp |  495 ---------------
 source/Lib/TLibEncoder/WeightPredAnalysis.h   |   76 --
 source/common/common.cpp                      |    3 -
 source/common/intrapred.cpp                   |    4 +-
 source/common/x86/ipfilter8.asm               |  826 ++++++++++++-------------
 source/common/x86/pixeladd8.asm               |   86 +-
 source/encoder/CMakeLists.txt                 |    6 +-
 source/encoder/compress.cpp                   |   12 +-
 source/encoder/encoder.cpp                    |   29 +-
 source/encoder/encoder.h                      |    3 +-
 source/encoder/frameencoder.cpp               |    6 -
 source/encoder/frameencoder.h                 |    2 -
 source/encoder/reference.cpp                  |   11 +-
 20 files changed, 594 insertions(+), 1200 deletions(-)

diffs (truncated from 3160 to 300 lines):

diff -r 402b11d9df80 -r a3a9e0fb1a87 source/CMakeLists.txt

--- a/source/CMakeLists.txt	Thu Feb 13 09:59:42 2014 +0900
+++ b/source/CMakeLists.txt	Fri Feb 14 14:26:57 2014 +0900
@@ -284,8 +284,16 @@ endif()
 if(X265_LATEST_TAG)
     # convert lists of link libraries into -lstdc++ -lm etc..
     foreach(LIB ${CMAKE_CXX_IMPLICIT_LINK_LIBRARIES} ${PLATFORM_LIBS})
-        set(PRIVATE_LIBS "${PRIVATE_LIBS} -l${LIB}")
+        list(APPEND PLIBLIST "-l${LIB}")
     endforeach()
+    if(PLIBLIST)
+        # blacklist of libraries that should not be in Libs.private
+        list(REMOVE_ITEM PLIBLIST "-lc" "-lpthread")
+        string(REPLACE ";" " " PRIVATE_LIBS "${PLIBLIST}")
+    else()
+        set(PRIVATE_LIBS "")
+    endif(PLIBLIST)
+
     # Produce a pkg-config file
     configure_file("x265.pc.in" "x265.pc" @ONLY)
     install(FILES       "${CMAKE_CURRENT_BINARY_DIR}/x265.pc"
diff -r 402b11d9df80 -r a3a9e0fb1a87 source/Lib/TLibCommon/TComRom.cpp
--- a/source/Lib/TLibCommon/TComRom.cpp	Thu Feb 13 09:59:42 2014 +0900
+++ b/source/Lib/TLibCommon/TComRom.cpp	Fri Feb 14 14:26:57 2014 +0900
@@ -104,7 +104,7 @@ uint32_t g_rasterToZscan[MAX_NUM_SPU_W *
 uint32_t g_rasterToPelX[MAX_NUM_SPU_W * MAX_NUM_SPU_W] = { 0, };
 uint32_t g_rasterToPelY[MAX_NUM_SPU_W * MAX_NUM_SPU_W] = { 0, };
 
-uint32_t g_puOffset[8] = { 0, 8, 4, 4, 2, 10, 1, 5 };
+const uint32_t g_puOffset[8] = { 0, 8, 4, 4, 2, 10, 1, 5 };
 
 void initZscanToRaster(int maxDepth, int depth, uint32_t startVal, uint32_t*& curIdx)
 {
@@ -192,12 +192,12 @@ const int16_t g_chromaFilter[8][NTAPS_CH
     { -2, 10, 58, -2 }
 };
 
-int g_quantScales[6] =
+const int g_quantScales[6] =
 {
     26214, 23302, 20560, 18396, 16384, 14564
 };
 
-int g_invQuantScales[6] =
+const int g_invQuantScales[6] =
 {
     40, 45, 51, 57, 64, 72
 };
@@ -330,7 +330,17 @@ const uint32_t g_sigLastScan8x8[3][4] =
     { 0, 1, 2, 3 },
     { 0, 2, 1, 3 }
 };
-uint32_t g_sigLastScanCG32x32[64];
+const uint32_t g_sigLastScanCG32x32[64] =
+{
+     0,  8,  1, 16,  9,  2, 24, 17,
+    10,  3, 32, 25, 18, 11,  4, 40,
+    33, 26, 19, 12,  5, 48, 41, 34,
+    27, 20, 13,  6, 56, 49, 42, 35,
+    28, 21, 14,  7, 57, 50, 43, 36,
+    29, 22, 15, 58, 51, 44, 37, 30,
+    23, 59, 52, 45, 38, 31, 60, 53,
+    46, 39, 61, 54, 47, 62, 55, 63
+};
 
 const uint32_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 };
 const uint32_t g_groupIdx[32]   = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 };
@@ -345,13 +355,8 @@ void initSigLastScan(uint32_t* buffD, ui
     const uint32_t  numScanPos  = uint32_t(width * width);
     uint32_t        nextScanPos = 0;
 
-    if (width < 16)
+    if (width <= 4)
     {
-        uint32_t* buffTemp = buffD;
-        if (width == 8)
-        {
-            buffTemp = g_sigLastScanCG32x32;
-        }
         for (uint32_t scanLine = 0; nextScanPos < numScanPos; scanLine++)
         {
             int primDim = int(scanLine);
@@ -364,7 +369,7 @@ void initSigLastScan(uint32_t* buffD, ui
 
             while (primDim >= 0 && scndDim < width)
             {
-                buffTemp[nextScanPos] = primDim * width + scndDim;
+                buffD[nextScanPos] = primDim * width + scndDim;
                 nextScanPos++;
                 scndDim++;
                 primDim--;
@@ -501,9 +506,9 @@ int g_quantInterDefault8x8[64] =
     20, 24, 25, 28, 33, 41, 54, 71,
     24, 25, 28, 33, 41, 54, 71, 91
 };
-uint32_t g_scalingListSize[4] = { 16, 64, 256, 1024 };
-uint32_t g_scalingListSizeX[4] = { 4, 8, 16,  32 };
-uint32_t g_scalingListNum[SCALING_LIST_SIZE_NUM] = { 6, 6, 6, 2 };
+const uint32_t g_scalingListSize[4] = { 16, 64, 256, 1024 };
+const uint32_t g_scalingListSizeX[4] = { 4, 8, 16,  32 };
+const uint32_t g_scalingListNum[SCALING_LIST_SIZE_NUM] = { 6, 6, 6, 2 };
 
 const int g_winUnitX[] = { 1, 2, 2, 1 };
 const int g_winUnitY[] = { 1, 2, 1, 1 };
diff -r 402b11d9df80 -r a3a9e0fb1a87 source/Lib/TLibCommon/TComRom.h
--- a/source/Lib/TLibCommon/TComRom.h	Thu Feb 13 09:59:42 2014 +0900
+++ b/source/Lib/TLibCommon/TComRom.h	Fri Feb 14 14:26:57 2014 +0900
@@ -94,7 +94,7 @@ extern uint32_t g_addCUDepth;
 #define MAX_TS_WIDTH  4
 #define MAX_TS_HEIGHT 4
 
-extern uint32_t g_puOffset[8];
+extern const uint32_t g_puOffset[8];
 
 #define QUANT_IQUANT_SHIFT    20 // Q(QP%6) * IQ(QP%6) = 2^20
 #define QUANT_SHIFT           14 // Q(4) = 2^14
@@ -104,8 +104,8 @@ extern uint32_t g_puOffset[8];
 #define SHIFT_INV_1ST          7 // Shift after first inverse transform stage
 #define SHIFT_INV_2ND         12 // Shift after second inverse transform stage
 
-extern int g_quantScales[6];     // Q(QP%6)
-extern int g_invQuantScales[6];  // IQ(QP%6)
+extern const int g_quantScales[6];     // Q(QP%6)
+extern const int g_invQuantScales[6];  // IQ(QP%6)
 extern const int16_t g_t4[4][4];
 extern const int16_t g_t8[8][8];
 extern const int16_t g_t16[16][16];
@@ -143,7 +143,7 @@ extern const uint32_t g_goRiceRange[5]; 
 extern const uint32_t g_goRicePrefixLen[5];  //!< prefix length for each maximum value
 
 extern const uint32_t g_sigLastScan8x8[3][4];   //!< coefficient group scan order for 8x8 TUs
-extern       uint32_t g_sigLastScanCG32x32[64];
+extern const uint32_t g_sigLastScanCG32x32[64];
 
 // ====================================================================================================================
 // ADI table
@@ -279,9 +279,9 @@ extern int g_quantInterDefault8x8[64];
 extern int g_quantInterDefault16x16[256];
 extern int g_quantInterDefault32x32[1024];
 extern int g_quantTSDefault4x4[16];
-extern uint32_t g_scalingListSize[SCALING_LIST_SIZE_NUM];
-extern uint32_t g_scalingListSizeX[SCALING_LIST_SIZE_NUM];
-extern uint32_t g_scalingListNum[SCALING_LIST_SIZE_NUM];
+extern const uint32_t g_scalingListSize[SCALING_LIST_SIZE_NUM];
+extern const uint32_t g_scalingListSizeX[SCALING_LIST_SIZE_NUM];
+extern const uint32_t g_scalingListNum[SCALING_LIST_SIZE_NUM];
 //! \}
 
 // Map Luma samples to chroma samples
diff -r 402b11d9df80 -r a3a9e0fb1a87 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp	Thu Feb 13 09:59:42 2014 +0900
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp	Fri Feb 14 14:26:57 2014 +0900
@@ -129,13 +129,13 @@ void TComTrQuant::setQPforQuant(int qpy,
 }
 
 // To minimize the distortion only. No rate is considered.
-void TComTrQuant::signBitHidingHDQ(TCoeff* qCoef, TCoeff* coef, uint32_t const *scan, int32_t* deltaU, int width, int height)
+void TComTrQuant::signBitHidingHDQ(TCoeff* qCoef, TCoeff* coef, uint32_t const *scan, int32_t* deltaU, int trSize)
 {
     int lastCG = -1;
     int absSum = 0;
     int n;
 
-    for (int subSet = (width * height - 1) >> LOG2_SCAN_SET_SIZE; subSet >= 0; subSet--)
+    for (int subSet = (trSize * trSize - 1) >> LOG2_SCAN_SET_SIZE; subSet >= 0; subSet--)
     {
         int  subPos = subSet << LOG2_SCAN_SET_SIZE;
         int  firstNZPosInCG = SCAN_SET_SIZE, lastNZPosInCG = -1;
@@ -252,31 +252,29 @@ void TComTrQuant::signBitHidingHDQ(TCoef
     } // TU loop
 }
 
-uint32_t TComTrQuant::xQuant(TComDataCU* cu, int32_t* coef, TCoeff* qCoef, int width, int height,
+uint32_t TComTrQuant::xQuant(TComDataCU* cu, int32_t* coef, TCoeff* qCoef, int trSize,
                              TextType ttype, uint32_t absPartIdx, int32_t *lastPos, bool curUseRDOQ)
 {
     uint32_t acSum = 0;
     int add = 0;
     bool useRDOQ = (cu->getTransformSkip(absPartIdx, ttype) ? m_useRDOQTS : m_useRDOQ) && curUseRDOQ;
 
-    assert(width == height);
-
 #if _MSC_VER
 #pragma warning(disable: 4127) // conditional expression is constant
 #endif
     if (useRDOQ && (ttype == TEXT_LUMA || RDOQ_CHROMA))
     {
-        acSum = xRateDistOptQuant(cu, coef, qCoef, width, height, ttype, absPartIdx, lastPos);
+        acSum = xRateDistOptQuant(cu, coef, qCoef, trSize, ttype, absPartIdx, lastPos);
     }
     else
     {
-        const uint32_t log2BlockSize = g_convertToBit[width] + 2;
-        uint32_t scanIdx = cu->getCoefScanIdx(absPartIdx, width, ttype == TEXT_LUMA, cu->isIntra(absPartIdx));
+        const uint32_t log2TrSize = g_convertToBit[trSize] + 2;
+        const uint32_t log2BlockSize = log2TrSize;
+        uint32_t scanIdx = cu->getCoefScanIdx(absPartIdx, trSize, ttype == TEXT_LUMA, cu->isIntra(absPartIdx));
         const uint32_t *scan = g_sigLastScan[scanIdx][log2BlockSize - 1];
 
         int deltaU[32 * 32];
 
-        uint32_t log2TrSize = g_convertToBit[width] + 2;
         int scalingListType = (cu->isIntra(absPartIdx) ? 0 : 3) + ttype;
         assert(scalingListType < 6);
         int32_t *quantCoeff = 0;
@@ -287,11 +285,11 @@ uint32_t TComTrQuant::xQuant(TComDataCU*
         int qbits = QUANT_SHIFT + m_qpParam.m_per + transformShift;
         add = (cu->getSlice()->getSliceType() == I_SLICE ? 171 : 85) << (qbits - 9);
 
-        int numCoeff = width * height;
+        int numCoeff = trSize * trSize;
         acSum += primitives.quant(coef, quantCoeff, deltaU, qCoef, qbits, add, numCoeff, lastPos);
 
         if (cu->getSlice()->getPPS()->getSignHideFlag() && acSum >= 2)
-            signBitHidingHDQ(qCoef, coef, scan, deltaU, width, height);
+            signBitHidingHDQ(qCoef, coef, scan, deltaU, trSize);
     }
 
     return acSum;
@@ -309,8 +307,7 @@ uint32_t TComTrQuant::transformNxN(TComD
                                    int16_t*    residual,
                                    uint32_t    stride,
                                    TCoeff*     coeff,
-                                   uint32_t    width,
-                                   uint32_t    height,
+                                   uint32_t    trSize,
                                    TextType    ttype,
                                    uint32_t    absPartIdx,
                                    int32_t*    lastPos,
@@ -320,11 +317,11 @@ uint32_t TComTrQuant::transformNxN(TComD
     if (cu->getCUTransquantBypass(absPartIdx))
     {
         uint32_t absSum = 0;
-        for (uint32_t k = 0; k < height; k++)
+        for (uint32_t k = 0; k < trSize; k++)
         {
-            for (uint32_t j = 0; j < width; j++)
+            for (uint32_t j = 0; j < trSize; j++)
             {
-                coeff[k * width + j] = ((int16_t)residual[k * stride + j]);
+                coeff[k * trSize + j] = ((int16_t)residual[k * stride + j]);
                 absSum += abs(residual[k * stride + j]);
             }
         }
@@ -342,29 +339,29 @@ uint32_t TComTrQuant::transformNxN(TComD
         mode = REG_DCT;
     }
 
-    assert((cu->getSlice()->getSPS()->getMaxTrSize() >= width));
+    assert((cu->getSlice()->getSPS()->getMaxTrSize() >= trSize));
     if (useTransformSkip)
     {
-        xTransformSkip(residual, stride, m_tmpCoeff, width, height);
+        xTransformSkip(residual, stride, m_tmpCoeff, trSize);
     }
     else
     {
         // TODO: this may need larger data types for X265_DEPTH > 8
-        const uint32_t log2BlockSize = g_convertToBit[width];
-        primitives.dct[DCT_4x4 + log2BlockSize - ((width == 4) && (mode != REG_DCT))](residual, m_tmpCoeff, stride);
+        const uint32_t log2BlockSize = g_convertToBit[trSize];
+        primitives.dct[DCT_4x4 + log2BlockSize - ((trSize == 4) && (mode != REG_DCT))](residual, m_tmpCoeff, stride);
     }
-    return xQuant(cu, m_tmpCoeff, coeff, width, height, ttype, absPartIdx, lastPos, curUseRDOQ);
+    return xQuant(cu, m_tmpCoeff, coeff, trSize, ttype, absPartIdx, lastPos, curUseRDOQ);
 }
 
-void TComTrQuant::invtransformNxN(bool transQuantBypass, uint32_t mode, int16_t* residual, uint32_t stride, TCoeff* coeff, uint32_t width, uint32_t height, int scalingListType, bool useTransformSkip, int lastPos)
+void TComTrQuant::invtransformNxN(bool transQuantBypass, uint32_t mode, int16_t* residual, uint32_t stride, TCoeff* coeff, uint32_t trSize, int scalingListType, bool useTransformSkip, int lastPos)
 {
     if (transQuantBypass)
     {
-        for (uint32_t k = 0; k < height; k++)
+        for (uint32_t k = 0; k < trSize; k++)
         {
-            for (uint32_t j = 0; j < width; j++)
+            for (uint32_t j = 0; j < trSize; j++)
             {
-                residual[k * stride + j] = (int16_t)(coeff[k * width + j]);
+                residual[k * stride + j] = (int16_t)(coeff[k * trSize + j]);
             }
         }
 
@@ -375,7 +372,7 @@ void TComTrQuant::invtransformNxN(bool t
     int per = m_qpParam.m_per;
     int rem = m_qpParam.m_rem;
     bool useScalingList = getUseScalingList();
-    uint32_t log2TrSize = g_convertToBit[width] + 2;
+    const uint32_t log2TrSize = g_convertToBit[trSize] + 2;
     int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize;
     int shift = QUANT_IQUANT_SHIFT - QUANT_SHIFT - transformShift;
     int32_t *dequantCoef = getDequantCoeff(scalingListType, m_qpParam.m_rem, log2TrSize - 2);
@@ -384,30 +381,30 @@ void TComTrQuant::invtransformNxN(bool t
     {
         static const int invQuantScales[6] = { 40, 45, 51, 57, 64, 72 };
         int scale = invQuantScales[rem] << per;
-        primitives.dequant_normal(coeff, m_tmpCoeff, width * height, scale, shift);
+        primitives.dequant_normal(coeff, m_tmpCoeff, trSize * trSize, scale, shift);
     }
     else