[x265-commits] [x265] cli: display param->bSaoNonDeblocked as bool in CLI help
Steve Borho
steve at borho.org
Thu Oct 2 06:31:50 CEST 2014
details: http://hg.videolan.org/x265/rev/1af64c8c2d28
branches:
changeset: 8182:1af64c8c2d28
user: Steve Borho <steve at borho.org>
date: Tue Sep 30 18:20:10 2014 -0500
description:
cli: display param->bSaoNonDeblocked as bool in CLI help
Subject: [x265] analysis: remove default arguments to checkInter_rd5_6 and checkInter_rd0_4
details: http://hg.videolan.org/x265/rev/4d9ff684c80f
branches:
changeset: 8183:4d9ff684c80f
user: Steve Borho <steve at borho.org>
date: Tue Sep 30 18:43:08 2014 -0500
description:
analysis: remove default arguments to checkInter_rd5_6 and checkInter_rd0_4
Subject: [x265] stub in framework for parallel mode analysis and parallel ME
details: http://hg.videolan.org/x265/rev/5c1a4804c42d
branches:
changeset: 8184:5c1a4804c42d
user: Steve Borho <steve at borho.org>
date: Sat Sep 27 12:25:21 2014 -0500
description:
stub in framework for parallel mode analysis and parallel ME
Subject: [x265] analysis: move non-distributed path into else clause
details: http://hg.videolan.org/x265/rev/b17ddb5d71f4
branches:
changeset: 8185:b17ddb5d71f4
user: Steve Borho <steve at borho.org>
date: Tue Sep 30 18:50:25 2014 -0500
description:
analysis: move non-distributed path into else clause
this is done in a second patch since it touches a lot of code trivially so it
Subject: [x265] analysis: fixup
details: http://hg.videolan.org/x265/rev/3bd852b225b5
branches:
changeset: 8186:3bd852b225b5
user: Steve Borho <steve at borho.org>
date: Tue Sep 30 22:20:32 2014 -0500
description:
analysis: fixup
Subject: [x265] slice: better structure packing
details: http://hg.videolan.org/x265/rev/d0fa09e9cca5
branches:
changeset: 8187:d0fa09e9cca5
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Oct 01 09:39:36 2014 +0530
description:
slice: better structure packing
Subject: [x265] ratecontrol: replace an imprecise comparison with a more precise check to ensure
details: http://hg.videolan.org/x265/rev/f9922ce58a20
branches:
changeset: 8188:f9922ce58a20
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Oct 01 23:09:55 2014 +0530
description:
ratecontrol: replace an imprecise comparison with a more precise check to ensure
consistency.
Subject: [x265] ratecontrol: fix float absolute check
details: http://hg.videolan.org/x265/rev/2a55baeb89cf
branches:
changeset: 8189:2a55baeb89cf
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Oct 01 23:18:57 2014 +0530
description:
ratecontrol: fix float absolute check
Subject: [x265] analysis: remove PartSize argument to checkIntraInInter_rd0_4
details: http://hg.videolan.org/x265/rev/0c6fe4a39a32
branches:
changeset: 8190:0c6fe4a39a32
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 14:13:18 2014 -0500
description:
analysis: remove PartSize argument to checkIntraInInter_rd0_4
Subject: [x265] analysis: remove bMergeOnly argument to checkInter_rd0_4, always false
details: http://hg.videolan.org/x265/rev/589d4d7e5a72
branches:
changeset: 8191:589d4d7e5a72
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 14:48:14 2014 -0500
description:
analysis: remove bMergeOnly argument to checkInter_rd0_4, always false
Subject: [x265] analysis: initialize job counters
details: http://hg.videolan.org/x265/rev/7daea9e6c5ae
branches:
changeset: 8192:7daea9e6c5ae
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 13:57:38 2014 -0500
description:
analysis: initialize job counters
Subject: [x265] analysis: use source buffer for source stride
details: http://hg.videolan.org/x265/rev/61e028b5a04e
branches:
changeset: 8193:61e028b5a04e
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 14:13:59 2014 -0500
description:
analysis: use source buffer for source stride
it was always a coincidence that the output stride matched
Subject: [x265] analysis: nit, remove obviously wrong comment
details: http://hg.videolan.org/x265/rev/bd3046c4bb36
branches:
changeset: 8194:bd3046c4bb36
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 14:14:19 2014 -0500
description:
analysis: nit, remove obviously wrong comment
Subject: [x265] analysis: further work on parallel ME
details: http://hg.videolan.org/x265/rev/c0bbd8d01257
branches:
changeset: 8195:c0bbd8d01257
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 00:49:02 2014 -0500
description:
analysis: further work on parallel ME
Subject: [x265] fix bug in 73c6c9086577 for rdLevel=0
details: http://hg.videolan.org/x265/rev/b57c63127527
branches:
changeset: 8196:b57c63127527
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Thu Oct 02 09:25:40 2014 +0900
description:
fix bug in 73c6c9086577 for rdLevel=0
Subject: [x265] rc : correct max AU size for first frame
details: http://hg.videolan.org/x265/rev/4579fc590099
branches:
changeset: 8197:4579fc590099
user: Aarthi Thirumalai
date: Thu Oct 02 00:14:13 2014 +0530
description:
rc : correct max AU size for first frame
Subject: [x265] rc: correct the threshold for resetABR function
details: http://hg.videolan.org/x265/rev/0212e9832ce7
branches:
changeset: 8198:0212e9832ce7
user: Aarthi Thirumalai
date: Thu Oct 02 00:46:10 2014 +0530
description:
rc: correct the threshold for resetABR function
Subject: [x265] analysis: remove unused variables, fixes warnings
details: http://hg.videolan.org/x265/rev/898a2546aff1
branches:
changeset: 8199:898a2546aff1
user: Steve Borho <steve at borho.org>
date: Wed Oct 01 23:26:28 2014 -0500
description:
analysis: remove unused variables, fixes warnings
diffstat:
source/Lib/TLibCommon/TComDataCU.cpp | 39 +-
source/Lib/TLibCommon/TComDataCU.h | 2 +-
source/common/slice.h | 6 +-
source/encoder/analysis.cpp | 577 ++++++++++++++++++++++++++--------
source/encoder/analysis.h | 47 ++-
source/encoder/encoder.cpp | 3 +-
source/encoder/entropy.cpp | 47 +-
source/encoder/entropy.h | 2 +-
source/encoder/frameencoder.h | 7 -
source/encoder/ratecontrol.cpp | 7 +-
source/x265.cpp | 2 +-
11 files changed, 526 insertions(+), 213 deletions(-)
diffs (truncated from 1220 to 300 lines):
diff -r a4859c266a59 -r 898a2546aff1 source/Lib/TLibCommon/TComDataCU.cpp
--- a/source/Lib/TLibCommon/TComDataCU.cpp Tue Sep 30 18:05:46 2014 -0500
+++ b/source/Lib/TLibCommon/TComDataCU.cpp Wed Oct 01 23:26:28 2014 -0500
@@ -454,19 +454,15 @@ void TComDataCU::initSubCU(TComDataCU* c
m_cuAboveRight = cu->getCUAboveRight();
}
-void TComDataCU::copyToSubCU(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth)
+void TComDataCU::copyFromPic(TComDataCU* ctu, CU* cuData)
{
- X265_CHECK(partUnitIdx < 4, "part unit should be less than 4\n");
+ m_pic = ctu->m_pic;
+ m_slice = ctu->m_slice;
+ m_cuAddr = ctu->getAddr();
+ m_absIdxInCTU = cuData->encodeIdx;
- uint32_t partOffset = cuData->numPartitions * partUnitIdx;
-
- m_pic = cu->m_pic;
- m_slice = cu->m_slice;
- m_cuAddr = cu->getAddr();
- m_absIdxInCTU = cuData->encodeIdx + partOffset;
-
- m_cuPelX = cu->getCUPelX() + ((partUnitIdx & 1) << (g_maxLog2CUSize - depth));
- m_cuPelY = cu->getCUPelY() + ((partUnitIdx >> 1) << (g_maxLog2CUSize - depth));
+ m_cuPelX = ctu->getCUPelX() + g_zscanToPelX[m_absIdxInCTU];
+ m_cuPelY = ctu->getCUPelY() + g_zscanToPelY[m_absIdxInCTU];
m_psyEnergy = 0;
m_totalPsyCost = MAX_INT64;
@@ -478,18 +474,17 @@ void TComDataCU::copyToSubCU(TComDataCU*
m_coeffBits = 0;
m_numPartitions = cuData->numPartitions;
- TComDataCU* otherCU = m_pic->getCU(m_cuAddr);
int sizeInChar = sizeof(char) * m_numPartitions;
- memcpy(m_skipFlag, otherCU->getSkipFlag() + m_absIdxInCTU, sizeof(*m_skipFlag) * m_numPartitions);
- memcpy(m_qp, otherCU->getQP() + m_absIdxInCTU, sizeInChar);
+ memcpy(m_skipFlag, ctu->getSkipFlag() + m_absIdxInCTU, sizeof(*m_skipFlag) * m_numPartitions);
+ memcpy(m_qp, ctu->getQP() + m_absIdxInCTU, sizeInChar);
- memcpy(m_partSizes, otherCU->getPartitionSize() + m_absIdxInCTU, sizeof(*m_partSizes) * m_numPartitions);
- memcpy(m_predModes, otherCU->getPredictionMode() + m_absIdxInCTU, sizeof(*m_predModes) * m_numPartitions);
+ memcpy(m_partSizes, ctu->getPartitionSize() + m_absIdxInCTU, sizeof(*m_partSizes) * m_numPartitions);
+ memcpy(m_predModes, ctu->getPredictionMode() + m_absIdxInCTU, sizeof(*m_predModes) * m_numPartitions);
- memcpy(m_lumaIntraDir, otherCU->getLumaIntraDir() + m_absIdxInCTU, sizeInChar);
- memcpy(m_depth, otherCU->getDepth() + m_absIdxInCTU, sizeInChar);
- memcpy(m_log2CUSize, otherCU->getLog2CUSize() + m_absIdxInCTU, sizeInChar);
+ memcpy(m_lumaIntraDir, ctu->getLumaIntraDir() + m_absIdxInCTU, sizeInChar);
+ memcpy(m_depth, ctu->getDepth() + m_absIdxInCTU, sizeInChar);
+ memcpy(m_log2CUSize, ctu->getLog2CUSize() + m_absIdxInCTU, sizeInChar);
}
// --------------------------------------------------------------------------------------------------------------------
@@ -2411,6 +2406,8 @@ void TComDataCU::getTUEntropyCodingParam
void TComDataCU::loadCTUData(uint32_t maxCUSize)
{
// Initialize the coding blocks inside the CTB
+ int picWidth = m_pic->m_origPicYuv->m_picWidth;
+ int picHeight = m_pic->m_origPicYuv->m_picHeight;
for (uint32_t log2CUSize = g_log2Size[maxCUSize], rangeCUIdx = 0; log2CUSize >= MIN_LOG2_CU_SIZE; log2CUSize--)
{
uint32_t blockSize = 1 << log2CUSize;
@@ -2425,8 +2422,8 @@ void TComDataCU::loadCTUData(uint32_t ma
uint32_t child_idx = rangeCUIdx + sbWidth * sbWidth + (depth_idx << 2);
uint32_t px = m_cuPelX + sb_x * blockSize;
uint32_t py = m_cuPelY + sb_y * blockSize;
- int32_t present_flag = px < m_pic->m_origPicYuv->m_picWidth && py < m_pic->m_origPicYuv->m_picHeight;
- int32_t split_mandatory_flag = present_flag && !last_level_flag && (px + blockSize > m_pic->m_origPicYuv->m_picWidth || py + blockSize > m_pic->m_origPicYuv->m_picHeight);
+ int32_t present_flag = px < picWidth && py < picHeight;
+ int32_t split_mandatory_flag = present_flag && !last_level_flag && (px + blockSize > picWidth || py + blockSize > picHeight);
/* Offset of the luma CU in the X, Y direction in terms of pixels from the CTU origin */
uint32_t xOffset = (sb_x * blockSize) >> 3;
diff -r a4859c266a59 -r 898a2546aff1 source/Lib/TLibCommon/TComDataCU.h
--- a/source/Lib/TLibCommon/TComDataCU.h Tue Sep 30 18:05:46 2014 -0500
+++ b/source/Lib/TLibCommon/TComDataCU.h Wed Oct 01 23:26:28 2014 -0500
@@ -276,7 +276,7 @@ public:
void initSubCU(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth, int qp);
void loadCTUData(uint32_t maxCUSize);
- void copyToSubCU(TComDataCU* ctu, CU* cuData, uint32_t partUnitIdx, uint32_t depth);
+ void copyFromPic(TComDataCU* ctu, CU* cuData);
void copyPartFrom(TComDataCU* cu, CU* cuData, uint32_t partUnitIdx, uint32_t depth, bool isRDObasedAnalysis = true);
void copyToPic(uint32_t depth);
diff -r a4859c266a59 -r 898a2546aff1 source/common/slice.h
--- a/source/common/slice.h Tue Sep 30 18:05:46 2014 -0500
+++ b/source/common/slice.h Wed Oct 01 23:26:28 2014 -0500
@@ -95,13 +95,13 @@ namespace Level {
struct ProfileTierLevel
{
bool tierFlag;
- int profileIdc;
- bool profileCompatibilityFlag[32];
- int levelIdc;
bool progressiveSourceFlag;
bool interlacedSourceFlag;
bool nonPackedConstraintFlag;
bool frameOnlyConstraintFlag;
+ bool profileCompatibilityFlag[32];
+ int profileIdc;
+ int levelIdc;
uint32_t minCrForLevel;
uint32_t maxLumaSrForLevel;
};
diff -r a4859c266a59 -r 898a2546aff1 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp Tue Sep 30 18:05:46 2014 -0500
+++ b/source/encoder/analysis.cpp Wed Oct 01 23:26:28 2014 -0500
@@ -21,16 +21,19 @@
* For more information, contact us at license @ x265.com.
*****************************************************************************/
+#include "common.h"
+#include "primitives.h"
+#include "threading.h"
+
#include "analysis.h"
-#include "primitives.h"
-#include "common.h"
#include "rdcost.h"
#include "encoder.h"
+
#include "PPA/ppa.h"
using namespace x265;
-Analysis::Analysis()
+Analysis::Analysis() : JobProvider(NULL)
{
m_bestPredYuv = NULL;
m_bestResiYuv = NULL;
@@ -43,12 +46,16 @@ Analysis::Analysis()
m_origYuv = NULL;
for (int i = 0; i < MAX_PRED_TYPES; i++)
m_modePredYuv[i] = NULL;
+ m_bJobsQueued = false;
+ m_totalNumME = m_numAcquiredME = m_numCompletedME = 0;
+ m_totalNumJobs = m_numAcquiredJobs = m_numCompletedJobs = 0;
}
-bool Analysis::create(uint32_t numCUDepth, uint32_t maxWidth)
+bool Analysis::create(uint32_t numCUDepth, uint32_t maxWidth, ThreadLocalData *tld)
{
X265_CHECK(numCUDepth <= NUM_CU_DEPTH, "invalid numCUDepth\n");
+ m_tld = tld;
m_bestPredYuv = new TComYuv*[numCUDepth];
m_bestResiYuv = new ShortYuv*[numCUDepth];
m_bestRecoYuv = new TComYuv*[numCUDepth];
@@ -227,6 +234,124 @@ void Analysis::destroy()
delete [] m_origYuv;
}
+bool Analysis::findJob(int threadId)
+{
+ /* try to acquire a CU mode to analyze */
+ if (m_totalNumJobs > m_numAcquiredJobs)
+ {
+ /* ATOMIC_INC returns the incremented value */
+ int id = ATOMIC_INC(&m_numAcquiredJobs);
+ if (m_totalNumJobs >= id)
+ {
+ parallelAnalysisJob(threadId, id - 1);
+ if (ATOMIC_INC(&m_numCompletedJobs) == m_totalNumJobs)
+ m_modeCompletionEvent.trigger();
+ return true;
+ }
+ }
+
+ /* else try to acquire a motion estimation task */
+ if (m_totalNumME > m_numAcquiredME)
+ {
+ int id = ATOMIC_INC(&m_numAcquiredME);
+ if (m_totalNumME >= id)
+ {
+ parallelME(threadId, id - 1);
+ if (ATOMIC_INC(&m_numCompletedME) == m_totalNumME)
+ m_meCompletionEvent.trigger();
+ return true;
+ }
+ }
+
+ return false;
+}
+
+void Analysis::parallelAnalysisJob(int threadId, int jobId)
+{
+ Analysis* slave;
+ int depth = m_curDepth;
+
+ if (threadId == -1)
+ slave = this;
+ else
+ {
+ TComDataCU *cu = m_interCU_2Nx2N[depth];
+ TComPicYuv* fenc = cu->m_pic->getPicYuvOrg();
+
+ slave = &m_tld[threadId].analysis;
+ slave->m_me.setSourcePlane(fenc->getLumaAddr(), fenc->getStride());
+ slave->m_log = &slave->m_sliceTypeLog[cu->m_slice->m_sliceType];
+ slave->m_rdEntropyCoders = this->m_rdEntropyCoders;
+ m_origYuv[0]->copyPartToYuv(slave->m_origYuv[depth], m_curCUData->encodeIdx);
+ slave->setQP(cu->m_slice, m_rdCost.m_qp);
+ slave->m_quant.setQPforQuant(cu);
+ slave->m_quant.m_nr = m_quant.m_nr;
+ }
+
+ if (m_param->rdLevel <= 4)
+ {
+ switch (jobId)
+ {
+ case 0:
+ slave->checkIntraInInter_rd0_4(m_intraInInterCU[depth], m_curCUData);
+ break;
+
+ case 1:
+ slave->checkInter_rd0_4(m_interCU_2Nx2N[depth], m_curCUData, m_modePredYuv[0][depth], SIZE_2Nx2N);
+ break;
+
+ case 2:
+ slave->checkInter_rd0_4(m_interCU_Nx2N[depth], m_curCUData, m_modePredYuv[1][depth], SIZE_Nx2N);
+ break;
+
+ case 3:
+ slave->checkInter_rd0_4(m_interCU_2NxN[depth], m_curCUData, m_modePredYuv[2][depth], SIZE_2NxN);
+ break;
+
+ default:
+ X265_CHECK(0, "invalid job ID for parallel mode analysis\n");
+ break;
+ }
+ }
+}
+
+void Analysis::parallelME(int threadId, int meId)
+{
+ Analysis* slave;
+ int depth = m_curDepth;
+ TComDataCU *cu = m_curMECu;
+ TComPicYuv* fenc = cu->m_pic->getPicYuvOrg();
+
+ if (threadId == -1)
+ slave = this;
+ else
+ {
+ slave = &m_tld[threadId].analysis;
+
+ slave->m_me.setSourcePlane(fenc->getLumaAddr(), fenc->getStride());
+ m_origYuv[0]->copyPartToYuv(slave->m_origYuv[depth], m_curCUData->encodeIdx);
+ slave->setQP(cu->m_slice, m_rdCost.m_qp);
+ }
+
+ uint32_t partAddr;
+ int roiWidth, roiHeight;
+ cu->getPartIndexAndSize(m_curPart, partAddr, roiWidth, roiHeight);
+
+ pixel* pu = fenc->getLumaAddr(cu->getAddr(), m_curCUData->encodeIdx + partAddr);
+ m_me.setSourcePU(pu - fenc->getLumaAddr(), roiWidth, roiHeight);
+
+ if (meId < cu->m_slice->m_numRefIdx[0])
+ {
+ /* perform list 0 motion search */
+ }
+ else
+ {
+ /* perform list 1 motion search */
+ }
+
+ /* TODO: acquire master output lock, if best cost save the info */
+}
+
void Analysis::compressCU(TComDataCU* cu)
{
Frame* pic = cu->m_pic;
@@ -750,41 +875,83 @@ void Analysis::compressInterCU_rd0_4(TCo
m_bestMergeCU[depth]->initCU(pic, cuAddr);
}
- /* Compute Merge Cost */
- checkMerge2Nx2N_rd0_4(m_bestMergeCU[depth], m_mergeCU[depth], cu, m_modePredYuv[3][depth], m_bestMergeRecoYuv[depth]);
- bool earlyskip = false;
- if (m_param->rdLevel >= 1)
- earlyskip = (m_param->bEnableEarlySkip && m_bestMergeCU[depth]->isSkipped(0));
+ const int distributedAnalysis = 0 && m_param->rdLevel > 2; /* only RD 3, 4 supported here */
- if (!earlyskip)
+ if (distributedAnalysis)
{
- /* Compute 2Nx2N mode costs */
- checkInter_rd0_4(m_interCU_2Nx2N[depth], cu, m_modePredYuv[0][depth], SIZE_2Nx2N);
+ /* with distributed analysis, we perform more speculative work.
+ * We do not have early outs for when skips are found so we
+ * always evaluate intra and all inter and merge modes
+ *
+ * jobs are numbered as:
+ * 0 = intra
+ * 1 = inter 2Nx2N
More information about the x265-commits
mailing list