[x265] [PATCH] analysis: re-order RD 5/6 analysis to do splits before ME or intra

Ashok Kumar Mishra ashok at multicorewareinc.com
Tue May 26 16:40:13 CEST 2015


I did subjective testing for one clip(ducks_take_off), but found no such
blurring. We are testing for more clips.
Below are the performance testing with depth-search patches on latest
tip (*10523
(a7bf7a150a70) asm: avx2 code for satd for all chroma i420*)

*Before applying patch*
x265.exe C:\testsequences\KristenAndSara_1280x720_60.y4m -f 600 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
encoded 600 frames in 154.19s (3.89 fps), 377.80 kb/s, Global PSNR: 40.776,
SSIM Mean Y: 0.9588345 (13.855 dB)

D:\ashok>x265.exe C:\testsequences\KristenAndSara_1280x720_60.y4m -f 600 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 0
encoded 600 frames in 131.39s (4.57 fps), 376.71 kb/s, Global PSNR: 40.757,
SSIM Mean Y: 0.9588142 (13.853 dB)

D:\ashok>x265.exe C:\testsequences\KristenAndSara_1280x720_60.y4m -f 600 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 1
encoded 600 frames in 115.51s (5.19 fps), 377.51 kb/s, Global PSNR: 40.751,
SSIM Mean Y: 0.9587068 (13.841 dB)

D:\ashok>x265.exe C:\testsequences\KristenAndSara_1280x720_60.y4m -f 600 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 2
encoded 600 frames in 95.47s (6.28 fps), 376.87 kb/s, Global PSNR: 40.731,
SSIM Mean Y: 0.9586194 (13.832 dB)

D:\ashok>x265.exe C:\testsequences\KristenAndSara_1280x720_60.y4m -f 600 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 3
encoded 600 frames in 90.95s (6.60 fps), 377.12 kb/s, Global PSNR: 40.722,
SSIM Mean Y: 0.9585990 (13.830 dB)
.......................................................................................................................................

*Before applying patch*
D:\ashok>x265.exe C:\testsequences\parkrun_ter_720p50.y4m -f 504 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
encoded 504 frames in 419.45s (1.20 fps), 6324.32 kb/s, Global PSNR:
32.433, SSIM Mean Y: 0.9029918 (10.132 dB)

D:\ashok>x265.exe C:\testsequences\parkrun_ter_720p50.y4m -f 504 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 0
encoded 504 frames in 351.93s (1.43 fps), 6315.64 kb/s, Global PSNR:
32.424, SSIM Mean Y: 0.9027574 (10.121 dB)

D:\ashok>x265.exe C:\testsequences\parkrun_ter_720p50.y4m -f 504 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 1
encoded 504 frames in 319.82s (1.58 fps), 6324.93 kb/s, Global PSNR:
32.422, SSIM Mean Y: 0.9027779 (10.122 dB)

D:\ashok>x265.exe C:\testsequences\parkrun_ter_720p50.y4m -f 504 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 2
encoded 504 frames in 288.01s (1.75 fps), 6344.59 kb/s, Global PSNR:
32.409, SSIM Mean Y: 0.9025627 (10.113 dB)

D:\ashok>x265.exe C:\testsequences\parkrun_ter_720p50.y4m -f 504 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 3
encoded 504 frames in 280.10s (1.80 fps), 6350.03 kb/s, Global PSNR:
32.409, SSIM Mean Y: 0.9025044 (10.110 dB)
............................................................................................................................................

*Before applying patch*
D:\ashok>x265.exe C:\testsequences\ducks_take_off_420_720p50.y4m -f 500 -o
test_original.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr
--tune=ssim
encoded 500 frames in 551.02s (0.91 fps), 7218.04 kb/s, Global PSNR:
31.158, SSIM Mean Y: 0.8811472 ( 9.250 dB)

D:\ashok>x265.exe C:\testsequences\ducks_take_off_420_720p50.y4m -f 500 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 0
encoded 500 frames in 503.35s (0.99 fps), 7209.20 kb/s, Global PSNR:
31.150, SSIM Mean Y: 0.8808186 ( 9.238 dB)

D:\ashok>x265.exe C:\testsequences\ducks_take_off_420_720p50.y4m -f 500 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 1
encoded 500 frames in 439.16s (1.14 fps), 7210.96 kb/s, Global PSNR:
31.150, SSIM Mean Y: 0.8808297 ( 9.238 dB)

D:\ashok>x265.exe C:\testsequences\ducks_take_off_420_720p50.y4m -f 500 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 2
encoded 500 frames in 393.70s (1.27 fps), 7217.97 kb/s, Global PSNR:
31.142, SSIM Mean Y: 0.8805879 ( 9.230 dB)

D:\ashok>x265.exe C:\testsequences\ducks_take_off_420_720p50.y4m -f 500 -o
test.hevc -r recon.y4m --hash 1 -p veryslow --ssim --psnr --tune=ssim
--limit-refs 3
encoded 500 frames in 379.67s (1.32 fps), 7222.50 kb/s, Global PSNR:
31.144, SSIM Mean Y: 0.8806669 ( 9.232 dB)

On Tue, May 26, 2015 at 5:16 PM, Steve Borho <steve at borho.org> wrote:

> On 05/26, Deepthi Nandakumar wrote:
> > On Mon, May 25, 2015 at 8:31 PM, <ashok at multicorewareinc.com> wrote:
> >
> > > # HG changeset patch
> > > # User Ashok Kumar Mishra<ashok at multicorewareinc.com>
> > > # Date 1432215988 -19800
> > > #      Thu May 21 19:16:28 2015 +0530
> > > # Node ID b11c2f1f8425425cfe190a45c710b65304d07db1
> > > # Parent  a7bf7a150a705489cb63d0454c59ec599bad8c93
> > > analysis: re-order RD 5/6 analysis to do splits before ME or intra
> > >
> > > This commit changes outputs because splits used to be avoided when an
> > > inter or
> > > intra mode was chosen without residual coding. This recursion
> early-out is
> > > no
> > > longer possible. Only merge without residual (aka skip) can abort
> > > recursion.
> > >
> > > This commit changes the order of analysis such that the four split
> blocks
> > > are
> > > analyzed prior to attempting any ME or intra modes. Future commits we
> will
> > > use
> > > the knowledge learned during split analysis to avoid unlikely work at
> the
> > > current depth (reducing motion references avoiding unlikely intra,
> > > rectangular,
> > > asymmetric, and lossless modes)
> >
> > Ok, I've edited this commit message, because this gives the impression
> > that the new patch introduces
> > less early outs, whereas the new patch now makes early outs more likely.
> > Earlier        : Abort recursion if Best(Merge, Skip, All Inter, Intra )
> is
> > a skip mode
> > New patch : Abort recursion of Best(Merge Skip) is a skip mode.
> >
> > We should do some subjective quality testing before we push this in. Is
> > this making skips more likely and blurring the video?
>
> FWIW: skips are normally not blurry, but I agree with subjective testing
> of the changes.
>
> > > diff -r a7bf7a150a70 -r b11c2f1f8425 source/encoder/analysis.cpp
> > > --- a/source/encoder/analysis.cpp       Fri May 22 14:29:35 2015 +0530
> > > +++ b/source/encoder/analysis.cpp       Thu May 21 19:16:28 2015 +0530
> > > @@ -1170,14 +1170,72 @@
> > >          }
> > >      }
> > >
> > > +    bool foundSkip = false;
> > > +    /* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
> > >      if (mightNotSplit)
> > >      {
> > >          md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
> > >          md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
> > >          checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE],
> > > cuGeom, false);
> > > -        bool earlySkip = m_param->bEnableEarlySkip && md.bestMode &&
> > > !md.bestMode->cu.getQtRootCbf(0);
> > > +        foundSkip = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
> > > +     }
> > >
> > > -        if (!earlySkip)
> > > +    // estimate split cost
> > > +    /* Step 2. Evaluate each of the 4 split sub-blocks in series */
> > > +    if (mightSplit && !foundSkip)
> > > +    {
> > > +        Mode* splitPred = &md.pred[PRED_SPLIT];
> > > +        splitPred->initCosts();
> > > +        CUData* splitCU = &splitPred->cu;
> > > +        splitCU->initSubCU(parentCTU, cuGeom, qp);
> > > +
> > > +        uint32_t nextDepth = depth + 1;
> > > +        ModeDepth& nd = m_modeDepth[nextDepth];
> > > +        invalidateContexts(nextDepth);
> > > +        Entropy* nextContext = &m_rqt[depth].cur;
> > > +        int nextQP = qp;
> > > +
> > > +        for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
> > > +        {
> > > +            const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset +
> > > subPartIdx);
> > > +            if (childGeom.flags & CUGeom::PRESENT)
> > > +            {
> > > +                m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv,
> > > childGeom.absPartIdx);
> > > +                m_rqt[nextDepth].cur.load(*nextContext);
> > > +
> > > +                if (m_slice->m_pps->bUseDQP && nextDepth <=
> > > m_slice->m_pps->maxCuDQPDepth)
> > > +                    nextQP = setLambdaFromQP(parentCTU,
> > > calculateQpforCuSize(parentCTU, childGeom));
> > > +
> > > +                compressInterCU_rd5_6(parentCTU, childGeom, zOrder,
> > > nextQP);
> > > +
> > > +                // Save best CU and pred data for this sub CU
> > > +                splitCU->copyPartFrom(nd.bestMode->cu, childGeom,
> > > subPartIdx);
> > > +                splitPred->addSubCosts(*nd.bestMode);
> > > +
> nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv,
> > > childGeom.numPartitions * subPartIdx);
> > > +                nextContext = &nd.bestMode->contexts;
> > > +            }
> > > +            else
> > > +            {
> > > +                splitCU->setEmptyPart(childGeom, subPartIdx);
> > > +                zOrder += g_depthInc[g_maxCUDepth - 1][nextDepth];
> > > +            }
> > > +        }
> > > +        nextContext->store(splitPred->contexts);
> > > +        if (mightNotSplit)
> > > +            addSplitFlagCost(*splitPred, cuGeom.depth);
> > > +        else
> > > +            updateModeCost(*splitPred);
> > > +
> > > +        checkDQPForSplitPred(*splitPred, cuGeom);
> > > +    }
> > > +
> > > +    /* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at
> current
> > > depth */
> > > +    if (mightNotSplit)
> > > +    {
> > > +        if (m_slice->m_pps->bUseDQP && depth <=
> > > m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)
> > > +            setLambdaFromQP(parentCTU, qp);
> > > +
> > > +        if (!(foundSkip && m_param->bEnableEarlySkip))
> > >          {
> > >              md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
> > >              checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N);
> > > @@ -1263,59 +1321,13 @@
> > >              addSplitFlagCost(*md.bestMode, cuGeom.depth);
> > >      }
> > >
> > > -    // estimate split cost
> > > -    if (mightSplit && (!md.bestMode || !md.bestMode->cu.isSkipped(0)))
> > > -    {
> > > -        Mode* splitPred = &md.pred[PRED_SPLIT];
> > > -        splitPred->initCosts();
> > > -        CUData* splitCU = &splitPred->cu;
> > > -        splitCU->initSubCU(parentCTU, cuGeom, qp);
> > > -
> > > -        uint32_t nextDepth = depth + 1;
> > > -        ModeDepth& nd = m_modeDepth[nextDepth];
> > > -        invalidateContexts(nextDepth);
> > > -        Entropy* nextContext = &m_rqt[depth].cur;
> > > -        int nextQP = qp;
> > > -
> > > -        for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
> > > -        {
> > > -            const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset +
> > > subPartIdx);
> > > -            if (childGeom.flags & CUGeom::PRESENT)
> > > -            {
> > > -                m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv,
> > > childGeom.absPartIdx);
> > > -                m_rqt[nextDepth].cur.load(*nextContext);
> > > -
> > > -                if (m_slice->m_pps->bUseDQP && nextDepth <=
> > > m_slice->m_pps->maxCuDQPDepth)
> > > -                    nextQP = setLambdaFromQP(parentCTU,
> > > calculateQpforCuSize(parentCTU, childGeom));
> > > -
> > > -                compressInterCU_rd5_6(parentCTU, childGeom, zOrder,
> > > nextQP);
> > > -
> > > -                // Save best CU and pred data for this sub CU
> > > -                splitCU->copyPartFrom(nd.bestMode->cu, childGeom,
> > > subPartIdx);
> > > -                splitPred->addSubCosts(*nd.bestMode);
> > > -
> nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv,
> > > childGeom.numPartitions * subPartIdx);
> > > -                nextContext = &nd.bestMode->contexts;
> > > -            }
> > > -            else
> > > -            {
> > > -                splitCU->setEmptyPart(childGeom, subPartIdx);
> > > -                zOrder += g_depthInc[g_maxCUDepth - 1][nextDepth];
> > > -            }
> > > -        }
> > > -        nextContext->store(splitPred->contexts);
> > > -        if (mightNotSplit)
> > > -            addSplitFlagCost(*splitPred, cuGeom.depth);
> > > -        else
> > > -            updateModeCost(*splitPred);
> > > -
> > > -        checkDQPForSplitPred(*splitPred, cuGeom);
> > > -        checkBestMode(*splitPred, depth);
> > > -    }
> > > +    /* compare split RD cost against best cost */
> > > +    if (mightSplit && !foundSkip)
> > > +        checkBestMode(md.pred[PRED_SPLIT], depth);
> > >
> > >      /* Copy best data to encData CTU and recon */
> > >      md.bestMode->cu.copyToPic(depth);
> > > -    if (md.bestMode != &md.pred[PRED_SPLIT])
> > > -        md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
> > > parentCTU.m_cuAddr, cuGeom.absPartIdx);
> > > +    md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
> > > parentCTU.m_cuAddr, cuGeom.absPartIdx);
> > >  }
> > >
> > >  /* sets md.bestMode if a valid merge candidate is found, else leaves
> it
> > > NULL */
> > > _______________________________________________
> > > x265-devel mailing list
> > > x265-devel at videolan.org
> > > https://mailman.videolan.org/listinfo/x265-devel
> > >
>
> > _______________________________________________
> > x265-devel mailing list
> > x265-devel at videolan.org
> > https://mailman.videolan.org/listinfo/x265-devel
>
>
> --
> Steve Borho
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20150526/55219a82/attachment-0001.html>


More information about the x265-devel mailing list