[x265-commits] [x265] constants: remove init/destroyROM functions
Steve Borho
steve at borho.org
Thu Nov 20 07:28:28 CET 2014
details: http://hg.videolan.org/x265/rev/d3389bb9efd0
branches:
changeset: 8855:d3389bb9efd0
user: Steve Borho <steve at borho.org>
date: Tue Nov 18 19:50:29 2014 -0600
description:
constants: remove init/destroyROM functions
Subject: [x265] threading: use 32bit atomic integer operations exclusively
details: http://hg.videolan.org/x265/rev/814b687db30e
branches:
changeset: 8856:814b687db30e
user: Steve Borho <steve at borho.org>
date: Tue Nov 18 20:16:57 2014 -0600
description:
threading: use 32bit atomic integer operations exclusively
The 32bit operations have better portability and have less onerous alignment
restrictions.
Subject: [x265] wavefront: fix msvc warning
details: http://hg.videolan.org/x265/rev/e29c618cd9a7
branches:
changeset: 8857:e29c618cd9a7
user: Steve Borho <steve at borho.org>
date: Tue Nov 18 21:25:08 2014 -0600
description:
wavefront: fix msvc warning
warning C4800: 'unsigned long' : forcing value to bool 'true' or 'false' (performance warning)
Subject: [x265] threadind: fixes for VC11 Win32 includes, prune two unused functions
details: http://hg.videolan.org/x265/rev/2b830f08d948
branches:
changeset: 8858:2b830f08d948
user: Steve Borho <steve at borho.org>
date: Wed Nov 19 01:28:50 2014 -0600
description:
threadind: fixes for VC11 Win32 includes, prune two unused functions
Subject: [x265] fseeko for mingw32
details: http://hg.videolan.org/x265/rev/cb9bb697fcaa
branches:
changeset: 8859:cb9bb697fcaa
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Wed Nov 19 15:39:25 2014 +0900
description:
fseeko for mingw32
Subject: [x265] refactorizaton of the transform/quant path.
details: http://hg.videolan.org/x265/rev/8bee552a1964
branches:
changeset: 8860:8bee552a1964
user: Praveen Tiwari
date: Tue Nov 18 14:00:27 2014 +0530
description:
refactorizaton of the transform/quant path.
This patch involves scaling down the DCT/IDCT coefficients from int32_t to
int16_t as they can be accommodated on int16_t without any introduction of
encode error, this allows us to clean up lots of DCT/IDCT intermediate
buffers, optimize enode efficiency for different cli options including noise
reduction by reducing data movement operations, accommodating more number of
coefficients in a single register for SIMD operations. This patch include all
necessary changes for the transfor/quant path including unit test code.
Subject: [x265] dct: fix gcc warnings
details: http://hg.videolan.org/x265/rev/34cb58c53859
branches:
changeset: 8861:34cb58c53859
user: Steve Borho <steve at borho.org>
date: Tue Nov 18 12:06:19 2014 -0600
description:
dct: fix gcc warnings
Subject: [x265] primitives: clarify constness
details: http://hg.videolan.org/x265/rev/99b5cebf8193
branches:
changeset: 8862:99b5cebf8193
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Sun Nov 16 14:32:17 2014 +0900
description:
primitives: clarify constness
Subject: [x265] disable denoiseDct asm code until fixed for Mac OS
details: http://hg.videolan.org/x265/rev/f236adb703f5
branches:
changeset: 8863:f236adb703f5
user: Praveen Tiwari
date: Wed Nov 19 18:42:24 2014 +0530
description:
disable denoiseDct asm code until fixed for Mac OS
Subject: [x265] replace char to int8_t, where it should be signed char
details: http://hg.videolan.org/x265/rev/14a8bb7bbcab
branches:
changeset: 8864:14a8bb7bbcab
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Thu Nov 20 11:30:33 2014 +0900
description:
replace char to int8_t, where it should be signed char
Subject: [x265] fix for rd=0
details: http://hg.videolan.org/x265/rev/b33cbe130c63
branches:
changeset: 8865:b33cbe130c63
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Thu Nov 20 14:25:01 2014 +0900
description:
fix for rd=0
Subject: [x265] encoder: fix analysis file read
details: http://hg.videolan.org/x265/rev/0c25a6eac0ca
branches:
changeset: 8866:0c25a6eac0ca
user: Gopu Govindaswamy <gopu at multicorewareinc.com>
date: Thu Nov 20 11:43:37 2014 +0530
description:
encoder: fix analysis file read
Subject: [x265] luma_hpp[4x4]: AVX2 asm code bug fix
details: http://hg.videolan.org/x265/rev/4b637cb9b792
branches:
changeset: 8867:4b637cb9b792
user: Praveen Tiwari
date: Thu Nov 20 11:49:38 2014 +0530
description:
luma_hpp[4x4]: AVX2 asm code bug fix
diffstat:
source/common/common.h | 4 +
source/common/constants.cpp | 16 -
source/common/constants.h | 3 -
source/common/cudata.cpp | 16 +-
source/common/cudata.h | 14 +-
source/common/dct.cpp | 223 ++---------
source/common/ipfilter.cpp | 32 +-
source/common/param.cpp | 2 +-
source/common/picyuv.h | 9 +
source/common/pixel.cpp | 129 ++---
source/common/predict.cpp | 90 ++--
source/common/primitives.cpp | 2 -
source/common/primitives.h | 113 ++---
source/common/quant.cpp | 36 +-
source/common/quant.h | 10 +-
source/common/shortyuv.cpp | 12 +-
source/common/threading.h | 58 +--
source/common/threadpool.cpp | 32 +-
source/common/vec/dct-sse3.cpp | 250 ++----------
source/common/vec/dct-sse41.cpp | 14 +-
source/common/vec/dct-ssse3.cpp | 34 +-
source/common/wavefront.cpp | 72 +--
source/common/wavefront.h | 8 +-
source/common/winxp.h | 24 -
source/common/x86/asm-primitives.cpp | 15 +-
source/common/x86/blockcopy8.asm | 86 +----
source/common/x86/blockcopy8.h | 113 ++---
source/common/x86/dct8.asm | 661 +++++++++++++++++++---------------
source/common/x86/dct8.h | 32 +-
source/common/x86/ipfilter8.asm | 4 +-
source/common/x86/ipfilter8.h | 46 +-
source/common/x86/mc.h | 52 +-
source/common/x86/pixel-util.h | 78 ++--
source/common/x86/pixel-util8.asm | 49 +-
source/common/x86/pixel.h | 82 ++--
source/common/yuv.cpp | 38 +-
source/encoder/analysis.cpp | 51 +-
source/encoder/api.cpp | 1 -
source/encoder/encoder.cpp | 15 +-
source/encoder/entropy.cpp | 6 +-
source/encoder/frameencoder.cpp | 2 +-
source/encoder/rdcost.h | 4 +-
source/encoder/search.cpp | 95 ++--
source/encoder/slicetype.cpp | 2 +-
source/test/intrapredharness.cpp | 2 -
source/test/mbdstharness.cpp | 65 +-
source/test/mbdstharness.h | 4 +-
source/test/pixelharness.cpp | 59 +--
source/test/pixelharness.h | 3 +-
49 files changed, 1153 insertions(+), 1615 deletions(-)
diffs (truncated from 5807 to 300 lines):
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/common.h
--- a/source/common/common.h Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/common.h Thu Nov 20 11:49:38 2014 +0530
@@ -56,6 +56,10 @@ extern "C" intptr_t x265_stack_align(voi
#define x265_stack_align(func, ...) func(__VA_ARGS__)
#endif
+#if defined(__MINGW32__)
+#define fseeko fseeko64
+#endif
+
#elif defined(_MSC_VER)
#define ALIGN_VAR_8(T, var) __declspec(align(8)) T var
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/constants.cpp
--- a/source/common/constants.cpp Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/constants.cpp Thu Nov 20 11:49:38 2014 +0530
@@ -27,22 +27,6 @@
namespace x265 {
-static int initialized /* = 0 */;
-
-// initialize ROM variables
-void initROM()
-{
- if (ATOMIC_CAS32(&initialized, 0, 1) == 1)
- return;
-}
-
-void destroyROM()
-{
- if (ATOMIC_CAS32(&initialized, 1, 0) == 0)
- return;
-}
-
-
// lambda = pow(2, (double)q / 6 - 2);
double x265_lambda_tab[QP_MAX_MAX + 1] =
{
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/constants.h
--- a/source/common/constants.h Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/constants.h Thu Nov 20 11:49:38 2014 +0530
@@ -29,9 +29,6 @@
namespace x265 {
// private namespace
-void initROM();
-void destroyROM();
-
void initZscanToRaster(uint32_t maxFullDepth, uint32_t depth, uint32_t startVal, uint32_t*& curIdx);
void initRasterToZscan(uint32_t maxFullDepth);
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/cudata.cpp
--- a/source/common/cudata.cpp Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/cudata.cpp Thu Nov 20 11:49:38 2014 +0530
@@ -227,12 +227,12 @@ void CUData::initialize(const CUDataMemP
/* Each CU's data is layed out sequentially within the charMemBlock */
uint8_t *charBuf = dataPool.charMemBlock + (m_numPartitions * BytesPerPartition) * instance;
- m_qp = (char*)charBuf; charBuf += m_numPartitions;
+ m_qp = (int8_t*)charBuf; charBuf += m_numPartitions;
m_log2CUSize = charBuf; charBuf += m_numPartitions;
m_lumaIntraDir = charBuf; charBuf += m_numPartitions;
m_tqBypass = charBuf; charBuf += m_numPartitions;
- m_refIdx[0] = (char*)charBuf; charBuf += m_numPartitions;
- m_refIdx[1] = (char*)charBuf; charBuf += m_numPartitions;
+ m_refIdx[0] = (int8_t*)charBuf; charBuf += m_numPartitions;
+ m_refIdx[1] = (int8_t*)charBuf; charBuf += m_numPartitions;
m_cuDepth = charBuf; charBuf += m_numPartitions;
m_predMode = charBuf; charBuf += m_numPartitions; /* the order up to here is important in initCTU() and initSubCU() */
m_partSize = charBuf; charBuf += m_numPartitions;
@@ -772,7 +772,7 @@ const CUData* CUData::getQpMinCuAbove(ui
}
/* Get reference QP from left QpMinCu or latest coded QP */
-char CUData::getRefQP(uint32_t curAbsIdxInCTU) const
+int8_t CUData::getRefQP(uint32_t curAbsIdxInCTU) const
{
uint32_t lPartIdx = 0, aPartIdx = 0;
const CUData* cULeft = getQpMinCuLeft(lPartIdx, m_absIdxInCTU + curAbsIdxInCTU);
@@ -794,7 +794,7 @@ int CUData::getLastValidPartIdx(int absP
return lastValidPartIdx;
}
-char CUData::getLastCodedQP(uint32_t absPartIdx) const
+int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
{
uint32_t quPartIdxMask = 0xFF << (g_maxFullDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
@@ -808,7 +808,7 @@ char CUData::getLastCodedQP(uint32_t abs
else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_CU_PARTITIONS);
else
- return (char)m_slice->m_sliceQp;
+ return (int8_t)m_slice->m_sliceQp;
}
}
@@ -936,7 +936,7 @@ uint32_t CUData::getCtxSkipFlag(uint32_t
return ctx;
}
-bool CUData::setQPSubCUs(char qp, uint32_t absPartIdx, uint32_t depth)
+bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
{
uint32_t curPartNumb = NUM_CU_PARTITIONS >> (depth << 1);
uint32_t curPartNumQ = curPartNumb >> 2;
@@ -1211,7 +1211,7 @@ void CUData::setPUMv(int list, const MV&
setAllPU(m_mv[list], mv, absPartIdx, puIdx);
}
-void CUData::setPURefIdx(int list, char refIdx, int absPartIdx, int puIdx)
+void CUData::setPURefIdx(int list, int8_t refIdx, int absPartIdx, int puIdx)
{
setAllPU(m_refIdx[list], refIdx, absPartIdx, puIdx);
}
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/cudata.h
--- a/source/common/cudata.h Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/cudata.h Thu Nov 20 11:49:38 2014 +0530
@@ -127,11 +127,11 @@ public:
int m_vChromaShift;
/* Per-part data, stored contiguously */
- char* m_qp; // array of QP values
+ int8_t* m_qp; // array of QP values
uint8_t* m_log2CUSize; // array of cu log2Size TODO: seems redundant to depth
uint8_t* m_lumaIntraDir; // array of intra directions (luma)
uint8_t* m_tqBypass; // array of CU lossless flags
- char* m_refIdx[2]; // array of motion reference indices per list
+ int8_t* m_refIdx[2]; // array of motion reference indices per list
uint8_t* m_cuDepth; // array of depths
uint8_t* m_predMode; // array of prediction modes
uint8_t* m_partSize; // array of partition sizes
@@ -177,7 +177,7 @@ public:
void clearCbf() { m_partSet(m_cbf[0], 0); m_partSet(m_cbf[1], 0); m_partSet(m_cbf[2], 0); }
/* these functions all take depth as an absolute depth from CTU, it is used to calculate the number of parts to copy */
- void setQPSubParts(char qp, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth]((uint8_t*)m_qp + absPartIdx, (uint8_t)qp); }
+ void setQPSubParts(int8_t qp, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth]((uint8_t*)m_qp + absPartIdx, (uint8_t)qp); }
void setTUDepthSubParts(uint8_t tuDepth, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth](m_tuDepth + absPartIdx, tuDepth); }
void setLumaIntraDirSubParts(uint8_t dir, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth](m_lumaIntraDir + absPartIdx, dir); }
void setChromIntraDirSubParts(uint8_t dir, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth](m_chromaIntraDir + absPartIdx, dir); }
@@ -186,15 +186,15 @@ public:
void setTransformSkipSubParts(uint8_t tskip, TextType ttype, uint32_t absPartIdx, uint32_t depth) { s_partSet[depth](m_transformSkip[ttype] + absPartIdx, tskip); }
void setTransformSkipPartRange(uint8_t tskip, TextType ttype, uint32_t absPartIdx, uint32_t coveredPartIdxes) { memset(m_transformSkip[ttype] + absPartIdx, tskip, coveredPartIdxes); }
- bool setQPSubCUs(char qp, uint32_t absPartIdx, uint32_t depth);
+ bool setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth);
void setPUInterDir(uint8_t dir, uint32_t absPartIdx, uint32_t puIdx);
void setPUMv(int list, const MV& mv, int absPartIdx, int puIdx);
- void setPURefIdx(int list, char refIdx, int absPartIdx, int puIdx);
+ void setPURefIdx(int list, int8_t refIdx, int absPartIdx, int puIdx);
uint8_t getCbf(uint32_t absPartIdx, TextType ttype, uint32_t trDepth) const { return (m_cbf[ttype][absPartIdx] >> trDepth) & 0x1; }
uint8_t getQtRootCbf(uint32_t absPartIdx) const { return m_cbf[0][absPartIdx] || m_cbf[1][absPartIdx] || m_cbf[2][absPartIdx]; }
- char getRefQP(uint32_t currAbsIdxInCTU) const;
+ int8_t getRefQP(uint32_t currAbsIdxInCTU) const;
uint32_t getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MVField (*mvFieldNeighbours)[2], uint8_t* interDirNeighbours) const;
void clipMv(MV& outMV) const;
int fillMvpCand(uint32_t puIdx, uint32_t absPartIdx, int picList, int refIdx, MV* amvpCand, MV* mvc) const;
@@ -237,7 +237,7 @@ protected:
template<typename T>
void setAllPU(T *p, const T& val, int absPartIdx, int puIdx);
- char getLastCodedQP(uint32_t absPartIdx) const;
+ int8_t getLastCodedQP(uint32_t absPartIdx) const;
int getLastValidPartIdx(int absPartIdx) const;
bool hasEqualMotion(uint32_t absPartIdx, const CUData& candCU, uint32_t candAbsPartIdx) const;
diff -r d059cfa88f1a -r 4b637cb9b792 source/common/dct.cpp
--- a/source/common/dct.cpp Tue Nov 18 14:11:12 2014 -0600
+++ b/source/common/dct.cpp Thu Nov 20 11:49:38 2014 +0530
@@ -41,7 +41,7 @@ namespace {
// Fast DST Algorithm. Full matrix multiplication for DST and Fast DST algorithm
// give identical results
-void fastForwardDst(int16_t *block, int16_t *coeff, int shift) // input block, output coeff
+void fastForwardDst(const int16_t* block, int16_t* coeff, int shift) // input block, output coeff
{
int c[4];
int rnd_factor = 1 << (shift - 1);
@@ -61,7 +61,7 @@ void fastForwardDst(int16_t *block, int1
}
}
-void inversedst(int16_t *tmp, int16_t *block, int shift) // input tmp, output block
+void inversedst(const int16_t* tmp, int16_t* block, int shift) // input tmp, output block
{
int i, c[4];
int rnd_factor = 1 << (shift - 1);
@@ -81,7 +81,7 @@ void inversedst(int16_t *tmp, int16_t *b
}
}
-void partialButterfly16(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly16(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[8], O[8];
@@ -134,7 +134,7 @@ void partialButterfly16(int16_t *src, in
}
}
-void partialButterfly32(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly32(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[16], O[16];
@@ -203,7 +203,7 @@ void partialButterfly32(int16_t *src, in
}
}
-void partialButterfly8(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly8(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[4], O[4];
@@ -240,7 +240,7 @@ void partialButterfly8(int16_t *src, int
}
}
-void partialButterflyInverse4(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse4(const int16_t* src, int16_t* dst, int shift, int line)
{
int j;
int E[2], O[2];
@@ -265,7 +265,7 @@ void partialButterflyInverse4(int16_t *s
}
}
-void partialButterflyInverse8(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse8(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[4], O[4];
@@ -301,7 +301,7 @@ void partialButterflyInverse8(int16_t *s
}
}
-void partialButterflyInverse16(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse16(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[8], O[8];
@@ -352,7 +352,7 @@ void partialButterflyInverse16(int16_t *
}
}
-void partialButterflyInverse32(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterflyInverse32(const int16_t* src, int16_t* dst, int shift, int line)
{
int j, k;
int E[16], O[16];
@@ -416,7 +416,7 @@ void partialButterflyInverse32(int16_t *
}
}
-void partialButterfly4(int16_t *src, int16_t *dst, int shift, int line)
+void partialButterfly4(const int16_t* src, int16_t* dst, int shift, int line)
{
int j;
int E[2], O[2];
@@ -440,7 +440,7 @@ void partialButterfly4(int16_t *src, int
}
}
-void dst4_c(int16_t *src, int32_t *dst, intptr_t stride)
+void dst4_c(const int16_t *src, int16_t *dst, intptr_t stride)
{
const int shift_1st = 1 + X265_DEPTH - 8;
const int shift_2nd = 8;
@@ -454,132 +454,54 @@ void dst4_c(int16_t *src, int32_t *dst,
}
fastForwardDst(block, coef, shift_1st);
- fastForwardDst(coef, block, shift_2nd);
-
-#define N (4)
- for (int i = 0; i < N; i++)
- {
- for (int j = 0; j < N; j++)
- {
- dst[i * N + j] = block[i * N + j];
- }
- }
-
-#undef N
+ fastForwardDst(coef, dst, shift_2nd);
}
-void dct4_c(int16_t *src, int32_t *dst, intptr_t stride)
+void dct4_c(const int16_t *src, int16_t *dst, intptr_t /* stride */)
{
const int shift_1st = 1 + X265_DEPTH - 8;
const int shift_2nd = 8;
ALIGN_VAR_32(int16_t, coef[4 * 4]);
- ALIGN_VAR_32(int16_t, block[4 * 4]);
More information about the x265-commits
mailing list