[x265] [PATCH] intra: skip RD analysis when sum of subCUsplitcostbigger than non-split cost
Pradeep Ramachandran
pradeep at multicorewareinc.com
Fri Nov 3 11:55:39 CET 2017
On Wed, Sep 6, 2017 at 7:37 PM, Ximing Cheng <chengximing1989 at foxmail.com>
wrote:
> As inter cu use 4 sub-cu analysis data to get the thresholds for rect and
> amp pu, if the sub-cu skip some or one part(s) of 1/4 cu, rect and amp cu
> cannot get part of sub-cu 2Nx2N mv cost, and this will break the thresholds
> calculation for rect and amp pu in current depth.
> Do you have any other good solutions for this problem?
>
One option is to skip the sub-part thresholds, and allow for changing
outputs when this option is enabled for inter-analysis.
We tried implementing this but don't see much performance improvement from
it; looks like the chances of the skip happening are rare. Have you had
better luck?
> Thanks!
> ---Original---
> *From:* "Pradeep Ramachandran"<pradeep at multicorewareinc.com>
> *Date:* 2017/8/18 18:49:51
> *To:* "Development for x265"<x265-devel at videolan.org>;
> *Subject:* Re: [x265] [PATCH] intra: skip RD analysis when sum of
> subCUsplitcostbigger than non-split cost
>
> Pushed to default branch. I agree that this looks like a bitexact change,
> and gives an nice perf boost.
> Can this also be extended to inter analysis as the same logic should work
> there too, and we don't have an early out there?
>
> Thanks,
> Pradeep.
>
> On Sat, Aug 12, 2017 at 11:13 PM, Tom Vaughan <
> tom.vaughan at multicorewareinc.com> wrote:
>
>> Thanks for this additional explanation, and thanks again for your
>> contribution!
>>
>>
>>
>> *From:* x265-devel [mailto:x265-devel-bounces at videolan.org] *On Behalf
>> Of *Ximing Cheng
>> *Sent:* Friday, August 11, 2017 12:32 PM
>> *To:* Ximing Cheng
>> *Subject:* Re: [x265] [PATCH] intra: skip RD analysis when sum of sub
>> CUsplitcostbigger than non-split cost
>>
>>
>>
>> In fact, this skip is not a fast skip algorithm.
>>
>> As the sum of split cost is larger than none split CU's best cost (both
>> rdcost of sub-cu and none split CU are without split flag cost), which
>> means splitting into 4 parts at this depth of cu is a worse case compared
>> with none split CU. So that, the remain N * 1/4 parts of CU analysis is
>> useless.
>>
>>
>>
>> ....................
>>
>> . A . B .
>>
>> . . .
>>
>> ....................
>>
>> . C . D .
>>
>> . . .
>>
>> .................... (A B C D is the 4 parts of a CU)
>>
>> If sum of sub CU split cost(A_Cost + B_Cost) larger than non-split
>> cost(NSCost), assume NSCost < A_Cost + B_Cost, the remain parts (C, D)
>> continue to analysis rd.
>>
>> C_Cost + D_Cost >= 0 --->
>>
>> NSCost < A_Cost + B_Cost + C_Cost + D_Cost ---> (likely that)
>>
>> NSCost + splitCost(splitflag = 0) < A_Cost + B_Cost + C_Cost + D_Cost +
>> splitCost(splitflag = 1) ---> choose none split
>>
>>
>>
>> So, C and D rd analysis can be skipped.
>>
>> So in my test cases, the MD5 checksum of the output bitstream is the same
>> with the original after this skip.
>>
>>
>>
>> ------------------ Original ------------------
>>
>> *From: * "Ximing Cheng";<chengximing1989 at foxmail.com>;
>>
>> *Send time:* Friday, Aug 4, 2017 1:56 AM
>>
>> *To:* "x265-devel"<x265-devel at videolan.org>;
>>
>> *Subject: * [x265] [PATCH] intra: skip RD analysis when sum of sub
>> CUsplitcostbigger than non-split cost
>>
>>
>>
>> # HG changeset patch
>> # User Ximing Cheng <ximingcheng at tencent.com>
>> # Date 1501782508 -28800
>> # Fri Aug 04 01:48:28 2017 +0800
>> # Node ID 5943a1f73d5814a3a723f814a4dd0635b1fe2b35
>> # Parent d11482e5fedbcdaf62ee3c6872f43827d99ad181
>> intra: skip RD analysis when sum of sub CUsplitcost bigger than non-split
>> cost
>>
>> diff -r d11482e5fedb -r 5943a1f73d58 source/CMakeLists.txt
>> --- a/source/CMakeLists.txt Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/CMakeLists.txt Fri Aug 04 01:48:28 2017 +0800
>> @@ -29,7 +29,7 @@
>> option(STATIC_LINK_CRT "Statically link C runtime for release builds"
>> OFF)
>> mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
>> # X265_BUILD must be incremented each time the public API is changed
>> -set(X265_BUILD 131)
>> +set(X265_BUILD 132)
>> configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
>> "${PROJECT_BINARY_DIR}/x265.def")
>> configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
>> diff -r d11482e5fedb -r 5943a1f73d58 source/common/param.cpp
>> --- a/source/common/param.cpp Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/common/param.cpp Fri Aug 04 01:48:28 2017 +0800
>> @@ -157,6 +157,7 @@
>> param->bEnableConstrainedIntra = 0;
>> param->bEnableStrongIntraSmoothing = 1;
>> param->bEnableFastIntra = 0;
>> + param->bEnableSplitRdSkip = 0;
>>
>> /* Inter Coding tools */
>> param->searchMethod = X265_HEX_SEARCH;
>> @@ -975,6 +976,7 @@
>> OPT("refine-inter")p->interRefine = atobool(value);
>> OPT("refine-mv")p->mvRefine = atobool(value);
>> OPT("force-flush")p->forceFlush = atoi(value);
>> + OPT("splitrd-skip") p->bEnableSplitRdSkip = atobool(value);
>> else
>> return X265_PARAM_BAD_NAME;
>> }
>> @@ -1431,6 +1433,7 @@
>> TOOLOPT(param->bEnableRdRefine, "rd-refine");
>> TOOLOPT(param->bEnableEarlySkip, "early-skip");
>> TOOLOPT(param->bEnableRecursionSkip, "rskip");
>> + TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
>> TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
>> TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
>> TOOLOPT(param->bEnableTSkipFast, "tskip-fast");
>> @@ -1560,6 +1563,7 @@
>> BOOL(p->bEnableTSkipFast, "tskip-fast");
>> BOOL(p->bCULossless, "cu-lossless");
>> BOOL(p->bIntraInBFrames, "b-intra");
>> + BOOL(p->bEnableSplitRdSkip, "splitrd-skip");
>> s += sprintf(s, " rdpenalty=%d", p->rdPenalty);
>> s += sprintf(s, " psy-rd=%.2f", p->psyRd);
>> s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
>> diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.cpp
>> --- a/source/encoder/analysis.cpp Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/encoder/analysis.cpp Fri Aug 04 01:48:28 2017 +0800
>> @@ -485,7 +485,7 @@
>> md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
>> parentCTU.m_cuAddr, cuGeom.absPartIdx);
>> }
>>
>> -void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom&
>> cuGeom, int32_t qp)
>> +uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const
>> CUGeom& cuGeom, int32_t qp)
>> {
>> uint32_t depth = cuGeom.depth;
>> ModeDepth& md = m_modeDepth[depth];
>> @@ -560,6 +560,8 @@
>> invalidateContexts(nextDepth);
>> Entropy* nextContext = &m_rqt[depth].cur;
>> int32_t nextQP = qp;
>> + uint64_t curCost = 0;
>> + int skipSplitCheck = 0;
>>
>> for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
>> {
>> @@ -572,7 +574,17 @@
>> if (m_slice->m_pps->bUseDQP && nextDepth <=
>> m_slice->m_pps->maxCuDQPDepth)
>> nextQP = setLambdaFromQP(parentCTU,
>> calculateQpforCuSize(parentCTU, childGeom));
>>
>> - compressIntraCU(parentCTU, childGeom, nextQP);
>> + if (m_param->bEnableSplitRdSkip)
>> + {
>> + curCost += compressIntraCU(parentCTU, childGeom,
>> nextQP);
>> + if (m_modeDepth[depth].bestMode && curCost >
>> m_modeDepth[depth].bestMode->rdCost)
>> + {
>> + skipSplitCheck = 1;
>> + break;
>> + }
>> + }
>> + else
>> + compressIntraCU(parentCTU, childGeom, nextQP);
>>
>> // Save best CU and pred data for this sub CU
>> splitCU->copyPartFrom(nd.bestMode->cu, childGeom,
>> subPartIdx);
>> @@ -590,14 +602,17 @@
>> memset(parentCTU.m_cuDepth + childGeom.absPartIdx,
>> 0, childGeom.numPartitions);
>> }
>> }
>> - nextContext->store(splitPred->contexts);
>> - if (mightNotSplit)
>> - addSplitFlagCost(*splitPred, cuGeom.depth);
>> - else
>> - updateModeCost(*splitPred);
>> -
>> - checkDQPForSplitPred(*splitPred, cuGeom);
>> - checkBestMode(*splitPred, depth);
>> + if (!skipSplitCheck)
>> + {
>> + nextContext->store(splitPred->contexts);
>> + if (mightNotSplit)
>> + addSplitFlagCost(*splitPred, cuGeom.depth);
>> + else
>> + updateModeCost(*splitPred);
>> +
>> + checkDQPForSplitPred(*splitPred, cuGeom);
>> + checkBestMode(*splitPred, depth);
>> + }
>> }
>>
>> if (m_param->bEnableRdRefine && depth <=
>> m_slice->m_pps->maxCuDQPDepth)
>> @@ -620,6 +635,8 @@
>> md.bestMode->cu.copyToPic(depth);
>> if (md.bestMode != &md.pred[PRED_SPLIT])
>> md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
>> parentCTU.m_cuAddr, cuGeom.absPartIdx);
>> +
>> + return md.bestMode->rdCost;
>> }
>>
>> void Analysis::PMODE::processTasks(int workerThreadId)
>> diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.h
>> --- a/source/encoder/analysis.h Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/encoder/analysis.h Fri Aug 04 01:48:28 2017 +0800
>> @@ -145,7 +145,7 @@
>> void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom,
>> int32_t qp, int32_t lqp);
>>
>> /* full analysis for an I-slice CU */
>> - void compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom,
>> int32_t qp);
>> + uint64_t compressIntraCU(const CUData& parentCTU, const CUGeom&
>> cuGeom, int32_t qp);
>>
>> /* full analysis for a P or B slice CU */
>> uint32_t compressInterCU_dist(const CUData& parentCTU, const CUGeom&
>> cuGeom, int32_t qp);
>> diff -r d11482e5fedb -r 5943a1f73d58 source/x265.h
>> --- a/source/x265.h Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/x265.h Fri Aug 04 01:48:28 2017 +0800
>> @@ -1482,6 +1482,9 @@
>>
>> /* Force flushing the frames from encoder */
>> int forceFlush;
>> +
>> + /* Enable skipping split RD analysis when sum of split CU rdCost
>> larger than none split CU rdCost for Intra CU */
>> + int bEnableSplitRdSkip;
>> } x265_param;
>>
>> /* x265_param_alloc:
>> diff -r d11482e5fedb -r 5943a1f73d58 source/x265cli.h
>> --- a/source/x265cli.h Mon Jul 24 11:15:38 2017 +0530
>> +++ b/source/x265cli.h Fri Aug 04 01:48:28 2017 +0800
>> @@ -281,6 +281,8 @@
>> { "refine-mv", no_argument, NULL, 0 },
>> { "no-refine-mv", no_argument, NULL, 0 },
>> { "force-flush", required_argument, NULL, 0 },
>> + { "splitrd-skip", no_argument, NULL, 0 },
>> + { "no-splitrd-skip", no_argument, NULL, 0 },
>> { 0, 0, 0, 0 },
>> { 0, 0, 0, 0 },
>> { 0, 0, 0, 0 },
>> @@ -375,6 +377,7 @@
>> H0(" --[no-]early-skip Enable early SKIP detection.
>> Default %s\n", OPT(param->bEnableEarlySkip));
>> H0(" --[no-]rskip Enable early exit from
>> recursion. Default %s\n", OPT(param->bEnableRecursionSkip));
>> H1(" --[no-]tskip-fast Enable fast intra transform
>> skipping. Default %s\n", OPT(param->bEnableTSkipFast));
>> + H1(" --[no-]splitrd-skip Enable skipping split RD
>> analysis when sum of split CU rdCost larger than none split CU rdCost for
>> Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip));
>> H1(" --nr-intra <integer> An integer value in range of 0
>> to 2000, which denotes strength of noise reduction in intra CUs. Default
>> 0\n");
>> H1(" --nr-inter <integer> An integer value in range of 0
>> to 2000, which denotes strength of noise reduction in inter CUs. Default
>> 0\n");
>> H0(" --ctu-info <integer> Enable receiving ctu
>> information asynchronously and determine reaction to the CTU information
>> (0, 1, 2, 4, 6) Default 0\n"
>>
>>
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>>
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>>
>>
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20171103/e62766c5/attachment-0001.html>
More information about the x265-devel
mailing list