[x265] [PATCH] analysis: avoid redundant rect/amp mode analysis based on split block rdCost and mvCost for rd-5/6

Ashok Kumar Mishra ashok at multicorewareinc.com
Thu Oct 15 17:04:25 CEST 2015


Below are the performance testing on Haswell with and without limiting
rect/amp analysis mode in veryslow preset.

*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test_b.hevc
encoded 504 frames in 223.08s (2.26 fps), 3596.14 kb/s, Avg QP:37.29,
Global PSNR: 30.707, SSIM Mean Y: 0.8688587 ( 8.823 dB)

*After*
D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-rect-amp 1
encoded 504 frames in 186.35s (2.70 fps), 3610.14 kb/s, Avg QP:37.35,
Global PSNR: 30.692, SSIM Mean Y: 0.8687821 ( 8.820 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1
encoded 504 frames in 188.32s (2.68 fps), 3604.27 kb/s, Avg QP:37.31,
Global PSNR: 30.712, SSIM Mean Y: 0.8689656 ( 8.826 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1
--limit-rect-amp 1
encoded 504 frames in 165.63s (3.04 fps), 3610.51 kb/s, Avg QP:37.34,
Global PSNR: 30.691, SSIM Mean Y: 0.8686912 ( 8.817 dB)

----------------------------------------------------------------------------------------------------------------------------------------------
*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test_b.hevc
encoded 500 frames in 795.40s (0.63 fps), 9513.58 kb/s, Avg QP:37.92,
Global PSNR: 30.459, SSIM Mean Y: 0.8214006 ( 7.481 dB)

*After*
D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-rect-amp 1
encoded 500 frames in 556.86s (0.90 fps), 9553.70 kb/s, Avg QP:37.92,
Global PSNR: 30.458, SSIM Mean Y: 0.8214283 ( 7.482 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1
encoded 500 frames in 625.53s (0.80 fps), 9518.09 kb/s, Avg QP:37.91,
Global PSNR: 30.457, SSIM Mean Y: 0.8213568 ( 7.480 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1
--limit-rect-amp 1
encoded 500 frames in 513.12s (0.97 fps), 9564.23 kb/s, Avg QP:37.92,
Global PSNR: 30.457, SSIM Mean Y: 0.8213727 ( 7.481 dB)

---------------------------------------------------------------------------------------------------------------------------------------------------
*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000
encoded 504 frames in 273.06s (1.85 fps), 5097.33 kb/s, Avg QP:35.53,
Global PSNR: 31.691, SSIM Mean Y: 0.8935050 ( 9.727 dB)

*After*
D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-refs 1
encoded 504 frames in 231.13s (2.18 fps), 5094.89 kb/s, Avg QP:35.54,
Global PSNR: 31.687, SSIM Mean Y: 0.8933111 ( 9.719 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000
--limit-rect-amp 1
encoded 504 frames in 228.21s (2.21 fps), 5099.24 kb/s, Avg QP:35.60,
Global PSNR: 31.671, SSIM Mean Y: 0.8932938 ( 9.718 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000
--limit-rect-amp 1 --limit-refs 1
encoded 504 frames in 199.34s (2.53 fps), 5098.16 kb/s, Avg QP:35.61,
Global PSNR: 31.667, SSIM Mean Y: 0.8931659 ( 9.713 dB)


----------------------------------------------------------------------------------------------------------------------------------------------------
*Before*
D:\ashok>x265_b.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000
encoded 500 frames in 659.57s (0.76 fps), 6054.60 kb/s, Avg QP:40.14,
Global PSNR: 29.542, SSIM Mean Y: 0.7802421 ( 6.581 dB)

*After*
D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-refs 1
encoded 500 frames in 524.01s (0.95 fps), 6053.17 kb/s, Avg QP:40.15,
Global PSNR: 29.537, SSIM Mean Y: 0.7800589 ( 6.577 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000
--limit-rect-amp 1
encoded 500 frames in 469.41s (1.07 fps), 6056.95 kb/s, Avg QP:40.18,
Global PSNR: 29.535, SSIM Mean Y: 0.7798592 ( 6.573 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000
--limit-rect-amp 1 --limit-refs 1
encoded 500 frames in 433.19s (1.15 fps), 6058.14 kb/s, Avg QP:40.19,
Global PSNR: 29.529, SSIM Mean Y: 0.7796667 ( 6.569 dB)

D:\ashok>x265.exe --input
\\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow
--hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000
--limit-rect-amp 1 --limit-refs 3
encoded 500 frames in 340.68s (1.47 fps), 6057.80 kb/s, Avg QP:40.19,
Global PSNR: 29.519, SSIM Mean Y: 0.7789705 ( 6.555 dB)

On Thu, Oct 15, 2015 at 8:31 PM, <ashok at multicorewareinc.com> wrote:

> # HG changeset patch
> # User Ashok Kumar Mishra<ashok at multicorewareinc.com>
> # Date 1444824873 -19800
> #      Wed Oct 14 17:44:33 2015 +0530
> # Node ID f3963e7e75b8dcb599250c082357e08fd32191a5
> # Parent  b6156a08b1def3584647f26096866c1a0c11e54a
> analysis: avoid redundant rect/amp mode analysis based on split block
> rdCost and mvCost for rd-5/6
> The analysis order for rect modes(first Nx2N, then 2NxN) is changed based
> on the rd cost of split blocks
> to get better PSNR and SSIM.
>
> diff -r b6156a08b1de -r f3963e7e75b8 source/common/param.cpp
> --- a/source/common/param.cpp   Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/common/param.cpp   Wed Oct 14 17:44:33 2015 +0530
> @@ -160,6 +160,7 @@
>      param->searchRange = 57;
>      param->maxNumMergeCand = 2;
>      param->limitReferences = 0;
> +    param->limitRectAmp = 0;
>      param->bEnableWeightedPred = 1;
>      param->bEnableWeightedBiPred = 0;
>      param->bEnableEarlySkip = 0;
> @@ -648,6 +649,7 @@
>      }
>      OPT("ref") p->maxNumReferences = atoi(value);
>      OPT("limit-refs") p->limitReferences = atoi(value);
> +    OPT("limit-rect-amp") p->limitRectAmp = atoi(value);
>      OPT("weightp") p->bEnableWeightedPred = atobool(value);
>      OPT("weightb") p->bEnableWeightedBiPred = atobool(value);
>      OPT("cbqpoffs") p->cbQpOffset = atoi(value);
> @@ -1041,6 +1043,8 @@
>            "subme must be greater than or equal to 0");
>      CHECK(param->limitReferences > 3,
>            "limitReferences must be 0, 1, 2 or 3");
> +    CHECK(param->limitRectAmp > 1,
> +          "limitRectAmp must be 0, 1");
>      CHECK(param->frameNumThreads < 0 || param->frameNumThreads >
> X265_MAX_FRAME_THREADS,
>            "frameNumThreads (--frame-threads) must be [0 ..
> X265_MAX_FRAME_THREADS)");
>      CHECK(param->cbQpOffset < -12, "Min. Chroma Cb QP Offset is -12");
> @@ -1434,6 +1438,7 @@
>      s += sprintf(s, " b-adapt=%d", p->bFrameAdaptive);
>      s += sprintf(s, " ref=%d", p->maxNumReferences);
>      s += sprintf(s, " limit-refs=%d", p->limitReferences);
> +    s += sprintf(s, " limit-rect-amp=%d", p->limitRectAmp);
>      BOOL(p->bEnableWeightedPred, "weightp");
>      BOOL(p->bEnableWeightedBiPred, "weightb");
>      s += sprintf(s, " aq-mode=%d", p->rc.aqMode);
> diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/analysis.cpp
> --- a/source/encoder/analysis.cpp       Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/encoder/analysis.cpp       Wed Oct 14 17:44:33 2015 +0530
> @@ -1172,7 +1172,7 @@
>      return refMask;
>  }
>
> -uint32_t Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const
> CUGeom& cuGeom, uint32_t &zOrder, int32_t qp)
> +SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const
> CUGeom& cuGeom, uint32_t &zOrder, int32_t qp)
>  {
>      uint32_t depth = cuGeom.depth;
>      ModeDepth& md = m_modeDepth[depth];
> @@ -1207,7 +1207,13 @@
>
>      bool foundSkip = false;
>      bool splitIntra = true;
> -    uint32_t splitRefs[4] = { 0, 0, 0, 0 };
> +
> +    SplitData splitData[4];
> +    splitData[0].initSplitCUData();
> +    splitData[1].initSplitCUData();
> +    splitData[2].initSplitCUData();
> +    splitData[3].initSplitCUData();
> +
>      /* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
>      if (mightNotSplit)
>      {
> @@ -1244,7 +1250,7 @@
>                  if (m_slice->m_pps->bUseDQP && nextDepth <=
> m_slice->m_pps->maxCuDQPDepth)
>                      nextQP = setLambdaFromQP(parentCTU,
> calculateQpforCuSize(parentCTU, childGeom));
>
> -                splitRefs[subPartIdx] = compressInterCU_rd5_6(parentCTU,
> childGeom, zOrder, nextQP);
> +                splitData[subPartIdx] = compressInterCU_rd5_6(parentCTU,
> childGeom, zOrder, nextQP);
>
>                  // Save best CU and pred data for this sub CU
>                  splitIntra |= nd.bestMode->cu.isIntra(0);
> @@ -1271,7 +1277,7 @@
>      /* Split CUs
>       *   0  1
>       *   2  3 */
> -    uint32_t allSplitRefs = splitRefs[0] | splitRefs[1] | splitRefs[2] |
> splitRefs[3];
> +    uint32_t allSplitRefs = splitData[0].splitRefs |
> splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
>      /* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current
> depth */
>      if (mightNotSplit)
>      {
> @@ -1290,7 +1296,7 @@
>              {
>                  CUData& cu = md.pred[PRED_2Nx2N].cu;
>                  uint32_t refMask = cu.getBestRefIdx(0);
> -                allSplitRefs = splitRefs[0] = splitRefs[1] = splitRefs[2]
> = splitRefs[3] = refMask;
> +                allSplitRefs = splitData[0].splitRefs =
> splitData[1].splitRefs = splitData[2].splitRefs = splitData[3].splitRefs =
> refMask;
>              }
>
>              if (m_slice->m_sliceType == B_SLICE)
> @@ -1306,22 +1312,80 @@
>
>              if (m_param->bEnableRectInter)
>              {
> -                refMasks[0] = splitRefs[0] | splitRefs[2]; /* left */
> -                refMasks[1] = splitRefs[1] | splitRefs[3]; /* right */
> -                md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
> -                checkInter_rd5_6(md.pred[PRED_Nx2N], cuGeom, SIZE_Nx2N,
> refMasks);
> -                checkBestMode(md.pred[PRED_Nx2N], cuGeom.depth);
> +                uint64_t splitCost = splitData[0].rdCost +
> splitData[1].rdCost + splitData[2].rdCost + splitData[3].rdCost;
> +                ModeDepth& md = m_modeDepth[depth];
> +                uint32_t threshold_2NxN, threshold_Nx2N;
>
> -                refMasks[0] = splitRefs[0] | splitRefs[1]; /* top */
> -                refMasks[1] = splitRefs[2] | splitRefs[3]; /* bot */
> -                md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);
> -                checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN,
> refMasks);
> -                checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);
> +                if (m_slice->m_sliceType == P_SLICE)
> +                {
> +                    threshold_2NxN = splitData[0].mvCost[0] +
> splitData[1].mvCost[0];
> +                    threshold_Nx2N = splitData[0].mvCost[0] +
> splitData[2].mvCost[0];
> +                }
> +                else
> +                {
> +                    threshold_2NxN = (splitData[0].mvCost[0] +
> splitData[1].mvCost[0]
> +                                    + splitData[0].mvCost[1] +
> splitData[1].mvCost[1] + 1) >> 1;
> +                    threshold_Nx2N = (splitData[0].mvCost[0] +
> splitData[2].mvCost[0]
> +                                    + splitData[0].mvCost[1] +
> splitData[2].mvCost[1] + 1) >> 1;
> +                }
> +
> +                int try_2NxN_first = threshold_2NxN < threshold_Nx2N;
> +                if (try_2NxN_first && splitCost < md.bestMode->rdCost +
> threshold_2NxN)
> +                {
> +                    refMasks[0] = splitData[0].splitRefs |
> splitData[1].splitRefs; /* top */
> +                    refMasks[1] = splitData[2].splitRefs |
> splitData[3].splitRefs; /* bot */
> +                    md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom,
> qp);
> +                    checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom,
> SIZE_2NxN, refMasks);
> +                    checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);
> +                }
> +
> +                if (splitCost < md.bestMode->rdCost + threshold_Nx2N)
> +                {
> +                    refMasks[0] = splitData[0].splitRefs |
> splitData[2].splitRefs; /* left */
> +                    refMasks[1] = splitData[1].splitRefs |
> splitData[3].splitRefs; /* right */
> +                    md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom,
> qp);
> +                    checkInter_rd5_6(md.pred[PRED_Nx2N], cuGeom,
> SIZE_Nx2N, refMasks);
> +                    checkBestMode(md.pred[PRED_Nx2N], cuGeom.depth);
> +                }
> +
> +                if (!try_2NxN_first && splitCost < md.bestMode->rdCost +
> threshold_2NxN)
> +                {
> +                    refMasks[0] = splitData[0].splitRefs |
> splitData[1].splitRefs; /* top */
> +                    refMasks[1] = splitData[2].splitRefs |
> splitData[3].splitRefs; /* bot */
> +                    md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom,
> qp);
> +                    checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom,
> SIZE_2NxN, refMasks);
> +                    checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);
> +                }
>              }
>
>              // Try AMP (SIZE_2NxnU, SIZE_2NxnD, SIZE_nLx2N, SIZE_nRx2N)
>              if (m_slice->m_sps->maxAMPDepth > depth)
>              {
> +                uint64_t splitCost = splitData[0].rdCost +
> splitData[1].rdCost + splitData[2].rdCost + splitData[3].rdCost;
> +                ModeDepth& md = m_modeDepth[depth];
> +                uint32_t threshold_2NxnU, threshold_2NxnD,
> threshold_nLx2N, threshold_nRx2N;
> +
> +                if (m_slice->m_sliceType == P_SLICE)
> +                {
> +                    threshold_2NxnU = splitData[0].mvCost[0] +
> splitData[1].mvCost[0];
> +                    threshold_2NxnD = splitData[2].mvCost[0] +
> splitData[3].mvCost[0];
> +
> +                    threshold_nLx2N = splitData[0].mvCost[0] +
> splitData[2].mvCost[0];
> +                    threshold_nRx2N = splitData[1].mvCost[0] +
> splitData[3].mvCost[0];
> +                }
> +                else
> +                {
> +                    threshold_2NxnU = (splitData[0].mvCost[0] +
> splitData[1].mvCost[0]
> +                                       + splitData[0].mvCost[1] +
> splitData[1].mvCost[1] + 1) >> 1;
> +                    threshold_2NxnD = (splitData[2].mvCost[0] +
> splitData[3].mvCost[0]
> +                                       + splitData[2].mvCost[1] +
> splitData[3].mvCost[1] + 1) >> 1;
> +
> +                    threshold_nLx2N = (splitData[0].mvCost[0] +
> splitData[2].mvCost[0]
> +                                       + splitData[0].mvCost[1] +
> splitData[2].mvCost[1] + 1) >> 1;
> +                    threshold_nRx2N = (splitData[1].mvCost[0] +
> splitData[3].mvCost[0]
> +                                       + splitData[1].mvCost[1] +
> splitData[3].mvCost[1] + 1) >> 1;
> +                }
> +
>                  bool bHor = false, bVer = false;
>                  if (md.bestMode->cu.m_partSize[0] == SIZE_2NxN)
>                      bHor = true;
> @@ -1335,31 +1399,64 @@
>
>                  if (bHor)
>                  {
> -                    refMasks[0] = splitRefs[0] | splitRefs[1]; /* 25% top
> */
> -                    refMasks[1] = allSplitRefs;                /* 75% bot
> */
> -                    md.pred[PRED_2NxnU].cu.initSubCU(parentCTU, cuGeom,
> qp);
> -                    checkInter_rd5_6(md.pred[PRED_2NxnU], cuGeom,
> SIZE_2NxnU, refMasks);
> -                    checkBestMode(md.pred[PRED_2NxnU], cuGeom.depth);
> +                    int try_2NxnD_first = threshold_2NxnD <
> threshold_2NxnU;
> +                    if (try_2NxnD_first && splitCost <
> md.bestMode->rdCost + threshold_2NxnD)
> +                    {
> +                        refMasks[0] = allSplitRefs;
>               /* 75% top */
> +                        refMasks[1] = splitData[2].splitRefs |
> splitData[3].splitRefs; /* 25% bot */
> +                        md.pred[PRED_2NxnD].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom,
> SIZE_2NxnD, refMasks);
> +                        checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);
> +                    }
>
> -                    refMasks[0] = allSplitRefs;                /* 75% top
> */
> -                    refMasks[1] = splitRefs[2] | splitRefs[3]; /* 25% bot
> */
> -                    md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom,
> qp);
> -                    checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom,
> SIZE_2NxnD, refMasks);
> -                    checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);
> +                    if (splitCost < md.bestMode->rdCost + threshold_2NxnU)
> +                    {
> +                        refMasks[0] = splitData[0].splitRefs |
> splitData[1].splitRefs; /* 25% top */
> +                        refMasks[1] = allSplitRefs;
>               /* 75% bot */
> +                        md.pred[PRED_2NxnU].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_2NxnU], cuGeom,
> SIZE_2NxnU, refMasks);
> +                        checkBestMode(md.pred[PRED_2NxnU], cuGeom.depth);
> +                    }
> +
> +                    if (!try_2NxnD_first && splitCost <
> md.bestMode->rdCost + threshold_2NxnD)
> +                    {
> +                        refMasks[0] = allSplitRefs;
>               /* 75% top */
> +                        refMasks[1] = splitData[2].splitRefs |
> splitData[3].splitRefs; /* 25% bot */
> +                        md.pred[PRED_2NxnD].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom,
> SIZE_2NxnD, refMasks);
> +                        checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);
> +                    }
>                  }
> +
>                  if (bVer)
>                  {
> -                    refMasks[0] = splitRefs[0] | splitRefs[2]; /* 25%
> left */
> -                    refMasks[1] = allSplitRefs;                /* 75%
> right */
> -                    md.pred[PRED_nLx2N].cu.initSubCU(parentCTU, cuGeom,
> qp);
> -                    checkInter_rd5_6(md.pred[PRED_nLx2N], cuGeom,
> SIZE_nLx2N, refMasks);
> -                    checkBestMode(md.pred[PRED_nLx2N], cuGeom.depth);
> +                    int try_nRx2N_first = threshold_nRx2N <
> threshold_nLx2N;
> +                    if (try_nRx2N_first && splitCost <
> md.bestMode->rdCost + threshold_nRx2N)
> +                    {
> +                        refMasks[0] = allSplitRefs;
>               /* 75% left  */
> +                        refMasks[1] = splitData[1].splitRefs |
> splitData[3].splitRefs; /* 25% right */
> +                        md.pred[PRED_nRx2N].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom,
> SIZE_nRx2N, refMasks);
> +                        checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);
> +                    }
>
> -                    refMasks[0] = allSplitRefs;                /* 75%
> left */
> -                    refMasks[1] = splitRefs[1] | splitRefs[3]; /* 25%
> right */
> -                    md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom,
> qp);
> -                    checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom,
> SIZE_nRx2N, refMasks);
> -                    checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);
> +                    if (splitCost < md.bestMode->rdCost + threshold_nLx2N)
> +                    {
> +                        refMasks[0] = splitData[0].splitRefs |
> splitData[2].splitRefs; /* 25% left  */
> +                        refMasks[1] = allSplitRefs;
>               /* 75% right */
> +                        md.pred[PRED_nLx2N].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_nLx2N], cuGeom,
> SIZE_nLx2N, refMasks);
> +                        checkBestMode(md.pred[PRED_nLx2N], cuGeom.depth);
> +                    }
> +
> +                    if (!try_nRx2N_first && splitCost <
> md.bestMode->rdCost + threshold_nRx2N)
> +                    {
> +                        refMasks[0] = allSplitRefs;
>               /* 75% left  */
> +                        refMasks[1] = splitData[1].splitRefs |
> splitData[3].splitRefs; /* 25% right */
> +                        md.pred[PRED_nRx2N].cu.initSubCU(parentCTU,
> cuGeom, qp);
> +                        checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom,
> SIZE_nRx2N, refMasks);
> +                        checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);
> +                    }
>                  }
>              }
>
> @@ -1398,26 +1495,39 @@
>          checkBestMode(md.pred[PRED_SPLIT], depth);
>
>         /* determine which motion references the parent CU should search */
> -    uint32_t refMask;
> +    SplitData splitCUData;
>      if (!(m_param->limitReferences & X265_REF_LIMIT_DEPTH))
> -        refMask = 0;
> +        splitCUData.splitRefs = 0;
>      else if (md.bestMode == &md.pred[PRED_SPLIT])
> -        refMask = allSplitRefs;
> +        splitCUData.splitRefs = allSplitRefs;
>      else
>      {
>          /* use best merge/inter mode, in case of intra use 2Nx2N inter
> references */
>          CUData& cu = md.bestMode->cu.isIntra(0) ? md.pred[PRED_2Nx2N].cu
> : md.bestMode->cu;
>          uint32_t numPU = cu.getNumPartInter(0);
> -        refMask = 0;
> +        splitCUData.splitRefs = 0;
>          for (uint32_t puIdx = 0, subPartIdx = 0; puIdx < numPU; puIdx++,
> subPartIdx += cu.getPUOffset(puIdx, 0))
> -            refMask |= cu.getBestRefIdx(subPartIdx);
> +            splitCUData.splitRefs |= cu.getBestRefIdx(subPartIdx);
> +    }
> +
> +    if (!m_param->limitRectAmp)
> +    {
> +        splitCUData.mvCost[0] = 0; // L0
> +        splitCUData.mvCost[1] = 0; // L1
> +        splitCUData.rdCost    = 0;
> +    }
> +    else
> +    {
> +        splitCUData.mvCost[0] = md.pred[PRED_2Nx2N].bestME[0][0].mvCost;
> // L0
> +        splitCUData.mvCost[1] = md.pred[PRED_2Nx2N].bestME[0][1].mvCost;
> // L1
> +        splitCUData.rdCost    = md.pred[PRED_2Nx2N].rdCost;
>      }
>
>      /* Copy best data to encData CTU and recon */
>      md.bestMode->cu.copyToPic(depth);
>      md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
> parentCTU.m_cuAddr, cuGeom.absPartIdx);
>
> -    return refMask;
> +    return splitCUData;
>  }
>
>  /* sets md.bestMode if a valid merge candidate is found, else leaves it
> NULL */
> diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/analysis.h
> --- a/source/encoder/analysis.h Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/encoder/analysis.h Wed Oct 14 17:44:33 2015 +0530
> @@ -41,6 +41,21 @@
>
>  class Entropy;
>
> +struct SplitData
> +{
> +    uint32_t splitRefs;
> +    uint32_t mvCost[2];
> +    uint64_t rdCost;
> +
> +    void initSplitCUData()
> +    {
> +        splitRefs = 0;
> +        mvCost[0] = 0; // L0
> +        mvCost[1] = 0; // L1
> +        rdCost    = 0;
> +    }
> +};
> +
>  class Analysis : public Search
>  {
>  public:
> @@ -117,7 +132,7 @@
>      /* full analysis for a P or B slice CU */
>      uint32_t compressInterCU_dist(const CUData& parentCTU, const CUGeom&
> cuGeom, int32_t qp);
>      uint32_t compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
> cuGeom, int32_t qp);
> -    uint32_t compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
> cuGeom, uint32_t &zOrder, int32_t qp);
> +    SplitData compressInterCU_rd5_6(const CUData& parentCTU, const
> CUGeom& cuGeom, uint32_t &zOrder, int32_t qp);
>
>      /* measure merge and skip */
>      void checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom&
> cuGeom);
> diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/search.cpp
> --- a/source/encoder/search.cpp Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/encoder/search.cpp Wed Oct 14 17:44:33 2015 +0530
> @@ -2186,19 +2186,21 @@
>
>                      /* Get total cost of partition, but only include MV
> bit cost once */
>                      bits += m_me.bitcost(outmv);
> -                    uint32_t cost = (satdCost - m_me.mvcost(outmv)) +
> m_rdCost.getCost(bits);
> +                    uint32_t mvCost = m_me.mvcost(outmv);
> +                    uint32_t cost = (satdCost - mvCost) +
> m_rdCost.getCost(bits);
>
>                      /* Refine MVP selection, updates: mvpIdx, bits, cost
> */
>                      mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
>
>                      if (cost < bestME[list].cost)
>                      {
> -                        bestME[list].mv = outmv;
> -                        bestME[list].mvp = mvp;
> -                        bestME[list].mvpIdx = mvpIdx;
> -                        bestME[list].ref = ref;
> -                        bestME[list].cost = cost;
> -                        bestME[list].bits = bits;
> +                        bestME[list].mv      = outmv;
> +                        bestME[list].mvp     = mvp;
> +                        bestME[list].mvpIdx  = mvpIdx;
> +                        bestME[list].ref     = ref;
> +                        bestME[list].cost    = cost;
> +                        bestME[list].bits    = bits;
> +                        bestME[list].mvCost  = mvCost;
>                      }
>                  }
>                  /* the second list ref bits start at bit 16 */
> diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/search.h
> --- a/source/encoder/search.h   Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/encoder/search.h   Wed Oct 14 17:44:33 2015 +0530
> @@ -85,8 +85,9 @@
>      MV       mvp;
>      int      mvpIdx;
>      int      ref;
> +    int      bits;
> +    uint32_t mvCost;
>      uint32_t cost;
> -    int      bits;
>  };
>
>  struct Mode
> diff -r b6156a08b1de -r f3963e7e75b8 source/x265.h
> --- a/source/x265.h     Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/x265.h     Wed Oct 14 17:44:33 2015 +0530
> @@ -822,6 +822,10 @@
>       * 4 split CUs at the next lower CU depth.  The two flags may be
> combined */
>      uint32_t  limitReferences;
>
> +    /* Limit rectangular and asymetric motion partitions based on rdCost
> and mvCost
> +    of the 4 split CUs at the next lower CU depth*/
> +    uint32_t limitRectAmp;
> +
>      /* ME search method (DIA, HEX, UMH, STAR, FULL). The search patterns
>       * (methods) are sorted in increasing complexity, with diamond being
> the
>       * simplest and fastest and full being the slowest.  DIA, HEX, and
> UMH were
> diff -r b6156a08b1de -r f3963e7e75b8 source/x265cli.h
> --- a/source/x265cli.h  Fri Oct 09 20:45:59 2015 +0530
> +++ b/source/x265cli.h  Wed Oct 14 17:44:33 2015 +0530
> @@ -126,6 +126,7 @@
>      { "b-pyramid",            no_argument, NULL, 0 },
>      { "ref",            required_argument, NULL, 0 },
>      { "limit-refs",     required_argument, NULL, 0 },
> +    { "limit-rect-amp", required_argument, NULL, 0 },
>      { "no-weightp",           no_argument, NULL, 0 },
>      { "weightp",              no_argument, NULL, 'w' },
>      { "no-weightb",           no_argument, NULL, 0 },
> @@ -310,12 +311,13 @@
>      H0("\nTemporal / motion search options:\n");
>      H0("   --max-merge <1..5>            Maximum number of merge
> candidates. Default %d\n", param->maxNumMergeCand);
>      H0("   --ref <integer>               max number of L0 references to
> be allowed (1 .. 16) Default %d\n", param->maxNumReferences);
> -    H0("   --limit-refs <0|1|2|3>        limit references per depth (1)
> or CU (2) or both (3). Default %d\n", param->limitReferences);
> +    H0("   --limit-refs <0|1|2|3>        Limit references per depth (1)
> or CU (2) or both (3). Default %d\n", param->limitReferences);
>      H0("   --me <string>                 Motion search method dia hex umh
> star full. Default %d\n", param->searchMethod);
>      H0("-m/--subme <integer>             Amount of subpel refinement to
> perform (0:least .. 7:most). Default %d \n", param->subpelRefine);
>      H0("   --merange <integer>           Motion search range. Default
> %d\n", param->searchRange);
>      H0("   --[no-]rect                   Enable rectangular motion
> partitions Nx2N and 2NxN. Default %s\n", OPT(param->bEnableRectInter));
>      H0("   --[no-]amp                    Enable asymmetric motion
> partitions, requires --rect. Default %s\n", OPT(param->bEnableAMP));
> +    H0("   --limit-rect-amp <0|1>        Limit rectangular and asymetirc
> motion partitions. Default %d\n", param->limitRectAmp);
>      H1("   --[no-]temporal-mvp           Enable temporal MV predictors.
> Default %s\n", OPT(param->bEnableTemporalMvp));
>      H0("\nSpatial / intra options:\n");
>      H0("   --[no-]strong-intra-smoothing Enable strong intra smoothing
> for 32x32 blocks. Default %s\n", OPT(param->bEnableStrongIntraSmoothing));
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20151015/c2ed970a/attachment-0001.html>


More information about the x265-devel mailing list