<div>I have also try this on inter CU, but the speed up rate is very very little as the chances of the skip are rare. I think we can also try this similar skip on RQT, if sum of sub-TU cost is larger than none-split TU cost, the split TU RQT processing can be stopped.</div><div><br></div><div>Thanks!</div><div><div><br></div><div><br></div><div style="font-size: 12px;font-family: Arial Narrow;padding:2px 0 2px 0;">------------------ Original ------------------</div><div style="font-size: 12px;background:#efefef;padding:8px;"><div><b>From: </b> "Pradeep Ramachandran";<pradeep@multicorewareinc.com>;</div><div><b>Date: </b> Fri, Nov 3, 2017 07:25 PM</div><div><b>To: </b> "Development for x265"<x265-devel@videolan.org>;<wbr></div><div></div><div><b>Subject: </b> Re: [x265] [PATCH] intra: skip RD analysis when sum ofsubCUsplitcostbigger than non-split cost</div></div><div><br></div><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 6, 2017 at 7:37 PM, Ximing Cheng <span dir="ltr"><<a href="mailto:chengximing1989@foxmail.com" target="_blank">chengximing1989@foxmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>As inter cu use 4 sub-cu analysis data to get the thresholds for rect and amp pu, if the sub-cu skip some or one part(s) of 1/4 cu, rect and amp cu cannot get part of sub-cu 2Nx2N mv cost, and this will break the thresholds calculation for rect and amp pu in current depth.</div><div>Do you have any other good solutions for this problem?</div></blockquote><div><br></div><div>One option is to skip the sub-part thresholds, and allow for changing outputs when this option is enabled for inter-analysis.</div><div>We tried implementing this but don't see much performance improvement from it; looks like the chances of the skip happening are rare. Have you had better luck?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Thanks!</div><div> </div><div><div style="font-size:12px;padding:2px 0">---Original---</div><div style="font-size:12px;background:#f0f0f0;color:#212121;padding:8px!important;border-radius:4px;line-height:1.5"><div><b>From:</b> "Pradeep Ramachandran"<<a href="mailto:pradeep@multicorewareinc.com" target="_blank">pradeep@<wbr>multicorewareinc.com</a>></div><div><b>Date:</b> 2017/8/18 18:49:51</div><div><b>To:</b> "Development for x265"<<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a>><wbr>;</div><div><b>Subject:</b> Re: [x265] [PATCH] intra: skip RD analysis when sum of subCUsplitcostbigger than non-split cost</div></div><br><div dir="ltr"><div class="gmail_extra">Pushed to default branch. I agree that this looks like a bitexact change, and gives an nice perf boost.</div><div class="gmail_extra">Can this also be extended to inter analysis as the same logic should work there too, and we don't have an early out there?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks,</div><div class="gmail_extra">Pradeep.</div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Aug 12, 2017 at 11:13 PM, Tom Vaughan <span dir="ltr"><<a href="mailto:tom.vaughan@multicorewareinc.com" target="_blank">tom.vaughan@multicorewareinc.<wbr>com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 7.18515634536743px;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding:14px 16px 14px 8.9814453125px;color:rgb(149,149,149);background-color:rgb(245,246,250)"><div lang="EN-US" link="blue" vlink="purple"><div class="m_7320603520919666993m_739851056445745565WordSection1"><p class="MsoNormal">Thanks for this additional explanation, and thanks again for your contribution!</p><p class="MsoNormal"> </p><p class="MsoNormal"><b>From:</b> x265-devel [mailto:<a href="mailto:x265-devel-bounces@videolan.org" target="_blank">x265-devel-bounces@vid<wbr>eolan.org</a>] <b>On Behalf Of </b>Ximing Cheng<br><b>Sent:</b> Friday, August 11, 2017 12:32 PM<br><b>To:</b> Ximing Cheng<br><b>Subject:</b> Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost</p><div><div class="m_7320603520919666993h5"><p class="MsoNormal"> </p><div><p class="MsoNormal">In fact, this skip is not a fast skip algorithm.</p></div><div><p class="MsoNormal">As the sum of split cost is larger than none split CU's best cost (both rdcost of sub-cu and none split CU are without split flag cost), which means splitting into 4 parts at this depth of cu is a worse case compared with none split CU. So that, the remain N * 1/4 parts of CU analysis is useless.</p></div><div><p class="MsoNormal"> </p></div><div><p class="MsoNormal">....................</p></div><div><p class="MsoNormal">. A . B .</p></div><div><p class="MsoNormal">. . .</p></div><div><p class="MsoNormal">....................</p></div><div><p class="MsoNormal">. C . D .</p></div><div><p class="MsoNormal">. . .</p></div><div><p class="MsoNormal">.................... (A B C D is the 4 parts of a CU)</p></div><div><p class="MsoNormal">If sum of sub CU split cost(A_Cost + B_Cost) larger than non-split cost(NSCost), assume NSCost < A_Cost + B_Cost, the remain parts (C, D) continue to analysis rd. </p></div><div><p class="MsoNormal">C_Cost + D_Cost >= 0 ---></p></div><div><p class="MsoNormal">NSCost < A_Cost + B_Cost + C_Cost + D_Cost ---> (likely that)</p></div><div><p class="MsoNormal">NSCost + splitCost(splitflag = 0) < A_Cost + B_Cost + C_Cost + D_Cost + splitCost(splitflag = 1) ---> choose none split</p></div><div><p class="MsoNormal"> </p></div><div><p class="MsoNormal">So, C and D rd analysis can be skipped.</p></div><div><p class="MsoNormal">So in my test cases, the MD5 checksum of the output bitstream is the same with the original after this skip.</p></div><div><p class="MsoNormal"> </p></div><div><div><p class="MsoNormal">------------------ Original --<wbr>----------------</p></div><div><div><p class="MsoNormal" style="background:#efefef"><b>From: </b> "Ximing Cheng";<<a href="mailto:chengximing1989@foxmail.com" target="_blank">chengximing1989@foxmai<wbr>l.com</a>>;</p></div><div><p class="MsoNormal" style="background:#efefef"><b>Send time:</b> Friday, Aug 4, 2017 1:56 AM</p></div><div><p class="MsoNormal" style="background:#efefef"><b>To:</b> "x265-devel"<<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@vi<wbr>deolan.org</a>>; </p></div><div><p class="MsoNormal" style="background:#efefef"><b>Subject: </b> [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost</p></div></div><div><p class="MsoNormal"> </p></div><p class="MsoNormal"># HG changeset patch<br># User Ximing Cheng <<a href="mailto:ximingcheng@tencent.com" target="_blank">ximingcheng@tencent.com</a>><br># Date 1501782508 -28800<br># Fri Aug 04 01:48:28 2017 +0800<br># Node ID 5943a1f73d5814a3a723f814a4dd06<wbr>35b1fe2b35<br># Parent d11482e5fedbcdaf62ee3c6872f438<wbr>27d99ad181<br>intra: skip RD analysis when sum of sub CUsplitcost bigger than non-split cost<br><br>diff -r d11482e5fedb -r 5943a1f73d58 source/CMakeLists.txt<br>--- a/source/CMakeLists.txt Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/CMakeLists.txt Fri Aug 04 01:48:28 2017 +0800<br>@@ -29,7 +29,7 @@<br> option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)<br> mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)<br> # X265_BUILD must be incremented each time the public API is changed<br>-set(X265_BUILD 131)<br>+set(X265_BUILD 132)<br> configure_file("${PROJECT_SOU<wbr>RCE_DIR}/<a href="http://x265.def.in" target="_blank">x265.def.in</a>"<br> "${PROJECT_BINARY_DIR}/<a href="http://x265.de">x265.de</a><wbr>f")<br> configure_file("${PROJECT_SOU<wbr>RCE_DIR}/<a href="http://x265_config.h.in" target="_blank">x265_config.h.in</a>"<br>diff -r d11482e5fedb -r 5943a1f73d58 source/common/param.cpp<br>--- a/source/common/param.cpp Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/common/param.cpp Fri Aug 04 01:48:28 2017 +0800<br>@@ -157,6 +157,7 @@<br> param->bEnableConstrainedIntra = 0;<br> param->bEnableStrongIntraSmoot<wbr>hing = 1;<br> param->bEnableFastIntra = 0;<br>+ param->bEnableSplitRdSkip = 0;<br> <br> /* Inter Coding tools */<br> param->searchMethod = X265_HEX_SEARCH;<br>@@ -975,6 +976,7 @@<br> OPT("refine-inter")p->interRef<wbr>ine = atobool(value);<br> OPT("refine-mv")p->mvRefine = atobool(value);<br> OPT("force-flush")p->forceFlus<wbr>h = atoi(value);<br>+ OPT("splitrd-skip") p->bEnableSplitRdSkip = atobool(value);<br> else<br> return X265_PARAM_BAD_NAME;<br> }<br>@@ -1431,6 +1433,7 @@<br> TOOLOPT(param->bEnableRdRefine<wbr>, "rd-refine");<br> TOOLOPT(param->bEnableEarlySki<wbr>p, "early-skip");<br> TOOLOPT(param->bEnableRecursio<wbr>nSkip, "rskip");<br>+ TOOLOPT(param->bEnableSplitRdS<wbr>kip, "splitrd-skip");<br> TOOLVAL(param->noiseReductionI<wbr>ntra, "nr-intra=%d");<br> TOOLVAL(param->noiseReductionI<wbr>nter, "nr-inter=%d");<br> TOOLOPT(param->bEnableTSkipFas<wbr>t, "tskip-fast");<br>@@ -1560,6 +1563,7 @@<br> BOOL(p->bEnableTSkipFast, "tskip-fast");<br> BOOL(p->bCULossless, "cu-lossless");<br> BOOL(p->bIntraInBFrames, "b-intra");<br>+ BOOL(p->bEnableSplitRdSkip, "splitrd-skip");<br> s += sprintf(s, " rdpenalty=%d", p->rdPenalty);<br> s += sprintf(s, " psy-rd=%.2f", p->psyRd);<br> s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);<br>diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.cpp<br>--- a/source/encoder/analysis.cpp Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/encoder/analysis.cpp Fri Aug 04 01:48:28 2017 +0800<br>@@ -485,7 +485,7 @@<br> md.bestMode->reconYuv.copyToPi<wbr>cYuv(*m_frame->m_reconPic, parentCTU.m_cuAddr, cuGeom.absPartIdx);<br> }<br> <br>-void Analysis::compressIntraCU(cons<wbr>t CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)<br>+uint64_t Analysis::compressIntraCU(cons<wbr>t CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)<br> {<br> uint32_t depth = cuGeom.depth;<br> ModeDepth& md = m_modeDepth[depth];<br>@@ -560,6 +560,8 @@<br> invalidateContexts(nextDepth);<br> Entropy* nextContext = &m_rqt[depth].cur;<br> int32_t nextQP = qp;<br>+ uint64_t curCost = 0;<br>+ int skipSplitCheck = 0;<br> <br> for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)<br> {<br>@@ -572,7 +574,17 @@<br> if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)<br> nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU<wbr>, childGeom));<br> <br>- compressIntraCU(parentCTU, childGeom, nextQP);<br>+ if (m_param->bEnableSplitRdSkip)<br>+ {<br>+ curCost += compressIntraCU(parentCTU, childGeom, nextQP);<br>+ if (m_modeDepth[depth].bestMode && curCost > m_modeDepth[depth].bestMode->r<wbr>dCost)<br>+ {<br>+ skipSplitCheck = 1;<br>+ break;<br>+ }<br>+ }<br>+ else<br>+ compressIntraCU(parentCTU, childGeom, nextQP);<br> <br> // Save best CU and pred data for this sub CU<br> splitCU->copyPartFrom(nd.bestM<wbr>ode->cu, childGeom, subPartIdx);<br>@@ -590,14 +602,17 @@<br> memset(parentCTU.m_cuDepth + childGeom.absPartIdx, 0, childGeom.numPartitions);<br> }<br> }<br>- nextContext->store(splitPred-><wbr>contexts);<br>- if (mightNotSplit)<br>- addSplitFlagCost(*splitPred, cuGeom.depth);<br>- else<br>- updateModeCost(*splitPred);<br>-<br>- checkDQPForSplitPred(*splitPre<wbr>d, cuGeom);<br>- checkBestMode(*splitPred, depth);<br>+ if (!skipSplitCheck)<br>+ {<br>+ nextContext->store(splitPred-><wbr>contexts);<br>+ if (mightNotSplit)<br>+ addSplitFlagCost(*splitPred, cuGeom.depth);<br>+ else<br>+ updateModeCost(*splitPred);<br>+<br>+ checkDQPForSplitPred(*splitPre<wbr>d, cuGeom);<br>+ checkBestMode(*splitPred, depth);<br>+ }<br> }<br> <br> if (m_param->bEnableRdRefine && depth <= m_slice->m_pps->maxCuDQPDepth)<br>@@ -620,6 +635,8 @@<br> md.bestMode->cu.copyToPic(dept<wbr>h);<br> if (md.bestMode != &md.pred[PRED_SPLIT])<br> md.bestMode->reconYuv.copyToPi<wbr>cYuv(*m_frame->m_reconPic, parentCTU.m_cuAddr, cuGeom.absPartIdx);<br>+<br>+ return md.bestMode->rdCost;<br> }<br> <br> void Analysis::PMODE::processTasks(<wbr>int workerThreadId)<br>diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.h<br>--- a/source/encoder/analysis.h Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/encoder/analysis.h Fri Aug 04 01:48:28 2017 +0800<br>@@ -145,7 +145,7 @@<br> void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);<br> <br> /* full analysis for an I-slice CU */<br>- void compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);<br>+ uint64_t compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);<br> <br> /* full analysis for a P or B slice CU */<br> uint32_t compressInterCU_dist(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);<br>diff -r d11482e5fedb -r 5943a1f73d58 source/x265.h<br>--- a/source/x265.h Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/x265.h Fri Aug 04 01:48:28 2017 +0800<br>@@ -1482,6 +1482,9 @@<br> <br> /* Force flushing the frames from encoder */<br> int forceFlush;<br>+<br>+ /* Enable skipping split RD analysis when sum of split CU rdCost larger than none split CU rdCost for Intra CU */<br>+ int bEnableSplitRdSkip;<br> } x265_param;<br> <br> /* x265_param_alloc:<br>diff -r d11482e5fedb -r 5943a1f73d58 source/x265cli.h<br>--- a/source/x265cli.h Mon Jul 24 11:15:38 2017 +0530<br>+++ b/source/x265cli.h Fri Aug 04 01:48:28 2017 +0800<br>@@ -281,6 +281,8 @@<br> { "refine-mv", no_argument, NULL, 0 },<br> { "no-refine-mv", no_argument, NULL, 0 },<br> { "force-flush", required_argument, NULL, 0 },<br>+ { "splitrd-skip", no_argument, NULL, 0 },<br>+ { "no-splitrd-skip", no_argument, NULL, 0 },<br> { 0, 0, 0, 0 },<br> { 0, 0, 0, 0 },<br> { 0, 0, 0, 0 },<br>@@ -375,6 +377,7 @@<br> H0(" --[no-]early-skip Enable early SKIP detection. Default %s\n", OPT(param->bEnableEarlySkip));<br> H0(" --[no-]rskip Enable early exit from recursion. Default %s\n", OPT(param->bEnableRecursionSki<wbr>p));<br> H1(" --[no-]tskip-fast Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));<br>+ H1(" --[no-]splitrd-skip Enable skipping split RD analysis when sum of split CU rdCost larger than none split CU rdCost for Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip)<wbr>);<br> H1(" --nr-intra <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");<br> H1(" --nr-inter <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");<br> H0(" --ctu-info <integer> Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"<br><br><br>______________________________<wbr>_________________<br>x265-devel mailing list<br><a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br><a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/l<wbr>istinfo/x265-devel</a></p></div></div></div></div></div>
<br>______________________________<wbr>_________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/l<wbr>istinfo/x265-devel</a><br>
<br></blockquote></div><br></div></div>
</div><br>______________________________<wbr>_________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/<wbr>listinfo/x265-devel</a><br>
<br></blockquote></div><br></div></div></div>