[x265] [PATCH] no-rdo: refactor enodeResandCalcRDInterCU function

Steve Borho steve at borho.org
Fri Nov 8 22:06:30 CET 2013


On Fri, Nov 8, 2013 at 3:20 AM, <deepthidevaki at multicorewareinc.com> wrote:

> # HG changeset patch
> # User Deepthi Devaki <deepthidevaki at multicorewareinc.com>
> # Date 1383894227 -19800
> # Node ID a73bc98e632c668c9ebd5a1a9ed40557cb44d00c
> # Parent  fef74c2e329dc24d9e93624de217babc2d6fa77f
> no-rdo: refactor enodeResandCalcRDInterCU function
>
> Divide estimateBits and modeDecision inside the function. EstimateBits
> uses a pseudo encode. Bitstream changes with this patch for --rd 1.
>
> diff -r fef74c2e329d -r a73bc98e632c source/Lib/TLibEncoder/TEncSearch.cpp
> --- a/source/Lib/TLibEncoder/TEncSearch.cpp     Fri Nov 08 02:57:47 2013
> -0600
> +++ b/source/Lib/TLibEncoder/TEncSearch.cpp     Fri Nov 08 12:33:47 2013
> +0530
> @@ -2941,6 +2941,144 @@
>      cu->setQPSubParts(qpBest, 0, cu->getDepth(0));
>  }
>
> +void TEncSearch::estimateRDInterCU(TComDataCU* cu, TComYuv* fencYuv,
> TComYuv* predYuv, TShortYUV* outResiYuv,
> +                                   TShortYUV* outBestResiYuv, TComYuv*
> outReconYuv, bool /*bSkipRes*/, bool curUseRDOQ)
> +{
> +    uint32_t width  = cu->getWidth(0);
> +    uint32_t height = cu->getHeight(0);
> +
> +    outResiYuv->subtract(fencYuv, predYuv, 0, width);
> +
> +    uint32_t zerobits = estimateZerobits(cu);
> +    uint32_t zerodistortion = estimateZeroDist(cu, fencYuv, predYuv);
> +    uint64_t zerocost = m_rdCost->calcRdCost(zerodistortion, zerobits);
> +
> +    uint32_t distortion = 0;
> +    uint32_t bits = 0;
> +    estimateBitsDist(cu, outResiYuv, bits, distortion, curUseRDOQ);
> +    uint64_t cost = m_rdCost->calcRdCost(distortion, bits);
> +
> +    if (cu->isLosslessCoded(0))
> +    {
> +        zerocost = cost + 1;
> +    }
> +
> +    if (zerocost < cost)
> +    {
> +        const uint32_t qpartnum = cu->getPic()->getNumPartInCU() >>
> (cu->getDepth(0) << 1);
> +        ::memset(cu->getTransformIdx(), 0, qpartnum * sizeof(UChar));
> +        ::memset(cu->getCbf(TEXT_LUMA), 0, qpartnum * sizeof(UChar));
> +        ::memset(cu->getCbf(TEXT_CHROMA_U), 0, qpartnum * sizeof(UChar));
> +        ::memset(cu->getCbf(TEXT_CHROMA_V), 0, qpartnum * sizeof(UChar));
> +        ::memset(cu->getCoeffY(), 0, width * height * sizeof(TCoeff));
> +        ::memset(cu->getCoeffCb(), 0, width * height * sizeof(TCoeff) >>
> 2);
> +        ::memset(cu->getCoeffCr(), 0, width * height * sizeof(TCoeff) >>
> 2);
> +        cu->setTransformSkipSubParts(0, 0, 0, 0, cu->getDepth(0));
>

The general effect of this patch is ok, though I expected performance to
improve by avoiding RDO measurements of signal bit costs, instead the patch
makes ultrafast preset slower.

And this makes me ask: which of these memset calls is actually necessary.

The CBF fields I can imagine acting as bool flags and thus setting them all
to zero would make sense, but we should probably try to do that as one
memset instead of three by mallocing them together and making a single cu
method to reset them.

Can't the rest of the fields be implied from CBF?  If CBF is zero, the
transform IDX and coefficient arrays should be ignored; why bother zeroing
them?  Is a coeff of 0 any more valid than what was in those buffers
before? The coeff matrices in particular can be quite large.

It's this type of hard analysis that I think is most necessary for the mode
decision code we've taken from the HM.  How much of this init/clear() code
is actually necessary and how much of it is just lazy HM coding?  I see a
lot of zeroing of pixel and coeff buffers that are almost certainly
complete wastes of time.

There is a lot of code in TEncCu.cpp, compress.cpp, and TEncSearch.cpp that
needs to be reviewed for these types of implications.


> +        if (cu->getMergeFlag(0) && cu->getPartitionSize(0) == SIZE_2Nx2N)
> +        {
> +            cu->setSkipFlagSubParts(true, 0, cu->getDepth(0));
> +        }
> +        bits = zerobits;
> +        outBestResiYuv->clear();
>

is it really necessary to memset the residual YUV short buffer?


> +        generateRecon(cu, predYuv, outBestResiYuv, outReconYuv, true);
> +    }
> +    else
> +    {
> +        xSetResidualQTData(cu, 0, 0, outBestResiYuv, cu->getDepth(0),
> true);
> +        generateRecon(cu, predYuv, outBestResiYuv, outReconYuv, false);
> +    }
> +
> +    int part = partitionFromSizes(width, height);
> +    distortion = primitives.sse_pp[part](fencYuv->getLumaAddr(),
> fencYuv->getStride(), outReconYuv->getLumaAddr(), outReconYuv->getStride());
> +    part = partitionFromSizes(width >> 1, height >> 1);
> +    distortion +=
> m_rdCost->scaleChromaDistCb(primitives.sse_pp[part](fencYuv->getCbAddr(),
> fencYuv->getCStride(), outReconYuv->getCbAddr(),
> outReconYuv->getCStride()));
> +    distortion +=
> m_rdCost->scaleChromaDistCr(primitives.sse_pp[part](fencYuv->getCrAddr(),
> fencYuv->getCStride(), outReconYuv->getCrAddr(),
> outReconYuv->getCStride()));
> +
> +    cu->m_totalBits       = bits;
> +    cu->m_totalDistortion = distortion;
> +    cu->m_totalCost       = m_rdCost->calcRdCost(distortion, bits);
> +}
> +
> +uint32_t TEncSearch::estimateZerobits(TComDataCU* cu)
> +{
> +    if (cu->isIntra(0))
> +    {
> +        return 0;
> +    }
> +
> +    uint32_t zeroResiBits = 0;
> +
> +    uint32_t width  = cu->getWidth(0);
> +    uint32_t height = cu->getHeight(0);
> +
> +    const uint32_t qpartnum = cu->getPic()->getNumPartInCU() >>
> (cu->getDepth(0) << 1);
> +    ::memset(cu->getTransformIdx(), 0, qpartnum * sizeof(UChar));
> +    ::memset(cu->getCbf(TEXT_LUMA), 0, qpartnum * sizeof(UChar));
> +    ::memset(cu->getCbf(TEXT_CHROMA_U), 0, qpartnum * sizeof(UChar));
> +    ::memset(cu->getCbf(TEXT_CHROMA_V), 0, qpartnum * sizeof(UChar));
> +    ::memset(cu->getCoeffY(), 0, width * height * sizeof(TCoeff));
> +    ::memset(cu->getCoeffCb(), 0, width * height * sizeof(TCoeff) >> 2);
> +    ::memset(cu->getCoeffCr(), 0, width * height * sizeof(TCoeff) >> 2);
> +    cu->setTransformSkipSubParts(0, 0, 0, 0, cu->getDepth(0));
> +
> +
>  m_rdGoOnSbacCoder->load(m_rdSbacCoders[cu->getDepth(0)][CI_CURR_BEST]);
> +    zeroResiBits = xSymbolBitsInter(cu);
> +    // Reset skipflags to false which would have set to true by
> xSymbolBitsInter if merge-skip
> +    cu->setSkipFlagSubParts(false, 0, cu->getDepth(0));
> +    return zeroResiBits;
> +}
> +
> +uint32_t TEncSearch::estimateZeroDist(TComDataCU* cu, TComYuv* fencYuv,
> TComYuv* predYuv)
> +{
> +    uint32_t distortion = 0;
> +
> +    uint32_t width  = cu->getWidth(0);
> +    uint32_t height = cu->getHeight(0);
> +
> +    int part = partitionFromSizes(width, height);
> +
> +    distortion = primitives.sse_pp[part](fencYuv->getLumaAddr(),
> fencYuv->getStride(), predYuv->getLumaAddr(), predYuv->getStride());
> +    part = partitionFromSizes(width >> 1, height >> 1);
> +    distortion +=
> m_rdCost->scaleChromaDistCb(primitives.sse_pp[part](fencYuv->getCbAddr(),
> fencYuv->getCStride(), predYuv->getCbAddr(), predYuv->getCStride()));
> +    distortion +=
> m_rdCost->scaleChromaDistCr(primitives.sse_pp[part](fencYuv->getCrAddr(),
> fencYuv->getCStride(), predYuv->getCrAddr(), predYuv->getCStride()));
> +    return distortion;
> +}
> +
> +void TEncSearch::generateRecon(TComDataCU* cu, TComYuv* predYuv,
> TShortYUV* resiYuv, TComYuv* reconYuv, bool skipRes)
> +{
> +    if (skipRes)
> +    {
> +        predYuv->copyToPartYuv(reconYuv, 0);
> +        return;
> +    }
> +    else
> +    {
> +        uint32_t width  = cu->getWidth(0);
> +        xSetResidualQTData(cu, 0, 0, resiYuv, cu->getDepth(0), true);
> +        reconYuv->addClip(predYuv, resiYuv, 0, width);
> +    }
> +}
> +
> +void TEncSearch::estimateBitsDist(TComDataCU* cu, TShortYUV* resiYuv,
> uint32_t& bits, uint32_t& distortion, bool curUseRDOQ)
> +{
> +    if (cu->isIntra(0))
> +    {
> +        return;
> +    }
> +
> +    bits = 0;
> +    distortion = 0;
> +    uint64_t cost = 0;
> +    uint32_t zeroDistortion = 0;
> +
>  m_rdGoOnSbacCoder->load(m_rdSbacCoders[cu->getDepth(0)][CI_CURR_BEST]);
> +    xEstimateResidualQT(cu, 0, 0, resiYuv, cu->getDepth(0), cost, bits,
> distortion, &zeroDistortion, curUseRDOQ);
> +
> +    xSetResidualQTData(cu, 0, 0, NULL, cu->getDepth(0), false);
> +
>  m_rdGoOnSbacCoder->load(m_rdSbacCoders[cu->getDepth(0)][CI_CURR_BEST]);
> +    bits = xSymbolBitsInter(cu);
> +
>  m_rdGoOnSbacCoder->store(m_rdSbacCoders[cu->getDepth(0)][CI_TEMP_BEST]);
> +}
> +
>  #if _MSC_VER
>  #pragma warning(disable: 4701) // potentially uninitialized local variable
>  #endif
> diff -r fef74c2e329d -r a73bc98e632c source/Lib/TLibEncoder/TEncSearch.h
> --- a/source/Lib/TLibEncoder/TEncSearch.h       Fri Nov 08 02:57:47 2013
> -0600
> +++ b/source/Lib/TLibEncoder/TEncSearch.h       Fri Nov 08 12:33:47 2013
> +0530
> @@ -153,6 +153,17 @@
>      void encodeResAndCalcRdInterCU(TComDataCU* cu, TComYuv* fencYuv,
> TComYuv* predYuv, TShortYUV* resiYuv, TShortYUV* bestResiYuv,
>                                     TComYuv* reconYuv, bool bSkipRes, bool
> curUseRDOQ = true);
>
> +    void estimateRDInterCU(TComDataCU* cu, TComYuv* fencYuv, TComYuv*
> predYuv, TShortYUV* resiYuv, TShortYUV* bestResiYuv,
> +                           TComYuv* reconYuv, bool bSkipRes, bool
> curUseRDOQ = true);
> +
> +    uint32_t estimateZerobits(TComDataCU* cu);
> +
> +    uint32_t estimateZeroDist(TComDataCU* cu, TComYuv* fencYuv, TComYuv*
> predYuv);
> +
> +    void generateRecon(TComDataCU* cu, TComYuv* predYuv, TShortYUV*
> resiYuv, TComYuv* reconYuv, bool skipRes);
> +
> +    void estimateBitsDist(TComDataCU* cu, TShortYUV* resiYuv, uint32_t&
> bits, uint32_t& distortion, bool curUseRDOQ);
> +
>      /// set ME search range
>      void setAdaptiveSearchRange(int dir, int refIdx, int merange) {
> m_adaptiveRange[dir][refIdx] = merange; }
>
> diff -r fef74c2e329d -r a73bc98e632c source/encoder/compress.cpp
> --- a/source/encoder/compress.cpp       Fri Nov 08 02:57:47 2013 -0600
> +++ b/source/encoder/compress.cpp       Fri Nov 08 12:33:47 2013 +0530
> @@ -319,7 +319,7 @@
>      m_tmpRecoYuv[depth] = yuv;
>
>      //Encode with residue
> -    m_search->encodeResAndCalcRdInterCU(outTempCU, m_origYuv[depth],
> bestPredYuv, m_tmpResiYuv[depth], m_bestResiYuv[depth],
> m_tmpRecoYuv[depth], false);
> +    m_search->estimateRDInterCU(outTempCU, m_origYuv[depth], bestPredYuv,
> m_tmpResiYuv[depth], m_bestResiYuv[depth], m_tmpRecoYuv[depth], false);
>
>      if (outTempCU->m_totalCost < outBestCU->m_totalCost)    //Choose best
> from no-residue mode and residue mode
>      {
> @@ -476,8 +476,9 @@
>                  m_search->motionCompensation(outBestCU,
> m_bestPredYuv[depth], REF_PIC_LIST_X, partIdx, false, true);
>              }
>
> -            m_search->encodeResAndCalcRdInterCU(outBestCU,
> m_origYuv[depth], m_bestPredYuv[depth], m_tmpResiYuv[depth],
> -                                                m_bestResiYuv[depth],
> m_bestRecoYuv[depth], false);
> +            m_search->estimateRDInterCU(outBestCU, m_origYuv[depth],
> m_bestPredYuv[depth], m_tmpResiYuv[depth],
> +                                        m_bestResiYuv[depth],
> m_bestRecoYuv[depth], false);
> +
>  #if CU_STAT_LOGFILE
>              fprintf(fp1, "\n N : %d ,  Best Inter : %d , ",
> outBestCU->getWidth(0) / 2, outBestCU->m_totalCost);
>  #endif
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>



-- 
Steve Borho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131108/543e3d51/attachment.html>


More information about the x265-devel mailing list