<div dir="ltr"><div>Below are the performance testing on Haswell with and without depth-search applied on pmode.</div><div><br></div><div><b>preset VERYSLOW</b></div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc --pmode</div><div>encoded 500 frames in 901.20s (0.55 fps), 8813.94 kb/s, Avg QP:37.97, Global PSNR: 30.369, SSIM Mean Y: 0.8159502 ( 7.351 dB)</div><div><br></div><div><b>After</b></div><div>D:\ashok>x265_a.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_a.hevc --pmode --limit-refs 1</div><div>encoded 500 frames in 632.76s (0.79 fps), 8666.79 kb/s, Avg QP:37.90, Global PSNR: 30.311, SSIM Mean Y: 0.8134978 ( 7.293 dB)</div><div><br></div><div><b>preset SLOWER</b></div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slower --hash=1 --no-info --psnr --ssim -o test_b.hevc --pmode</div><div>encoded 500 frames in 585.06s (0.85 fps), 8796.19 kb/s, Avg QP:38.03, Global PSNR: 30.356, SSIM Mean Y: 0.8153866 ( 7.337 dB)</div><div><br></div><div><b>After</b><br></div><div>D:\ashok>x265_a.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slower --hash=1 --no-info --psnr --ssim -o test_a.hevc --pmode --limit-refs 1</div><div>encoded 500 frames in 439.82s (1.14 fps), 8740.25 kb/s, Avg QP:37.99, Global PSNR: 30.334, SSIM Mean Y: 0.8144196 ( 7.315 dB)</div><div><br></div><div><b>preset SLOW</b></div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slow --hash=1 --no-info --psnr --ssim -o test_b.hevc --pmode</div><div>encoded 500 frames in 148.11s (3.38 fps), 8764.71 kb/s, Avg QP:38.08, Global PSNR: 30.282, SSIM Mean Y: 0.8124724 ( 7.269 dB)</div><div><br></div><div><b>After</b><br></div><div>D:\ashok>x265_a.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset slow --hash=1 --no-info --psnr --ssim -o test_a.hevc --pmode --limit-refs 1</div><div>encoded 500 frames in 110.45s (4.53 fps), 8712.04 kb/s, Avg QP:38.04, Global PSNR: 30.265, SSIM Mean Y: 0.8117029 ( 7.252 dB)</div><div><br></div><div><b>preset MEDIUM</b></div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset medium --hash=1 --no-info --psnr --ssim -o test_b.hevc --pmode</div><div>encoded 500 frames in 67.01s (7.46 fps), 8975.61 kb/s, Avg QP:37.97, Global PSNR: 30.076, SSIM Mean Y: 0.8040911 ( 7.079 dB)</div><div><br></div><div><b>After</b><br></div><div>D:\ashok>x265_a.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset medium --hash=1 --no-info --psnr --ssim -o test_a.hevc --pmode --limit-refs 1</div><div>encoded 500 frames in 37.56s (13.31 fps), 8954.61 kb/s, Avg QP:37.96, Global PSNR: 30.070, SSIM Mean Y: 0.8038041 ( 7.073 dB)</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 14, 2015 at 4:27 PM, Ashok Kumar Mishra <span dir="ltr"><<a href="mailto:ashok@multicorewareinc.com" target="_blank">ashok@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Yes, performance has improved, will send after some time for all presets.</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 14, 2015 at 4:20 PM, Steve Borho <span dir="ltr"><<a href="mailto:steve@borho.org" target="_blank">steve@borho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 08/14, <a href="mailto:ashok@multicorewareinc.com" target="_blank">ashok@multicorewareinc.com</a> wrote:<br>
> # HG changeset patch<br>
> # User Ashok Kumar Mishra<<a href="mailto:ashok@multicorewareinc.com" target="_blank">ashok@multicorewareinc.com</a>><br>
> # Date 1439540228 -19800<br>
> # Fri Aug 14 13:47:08 2015 +0530<br>
> # Node ID 9e26bef14543025908ed979b3d217417baf1ac8f<br>
> # Parent d56b2466c04459205287e1581d8a36eebf372ba6<br>
> analysis: re-order analysis to do splits before ME or intra for pmode<br>
<br>
</span>I'm happy to see these patches. They look good, do you have any example<br>
before/after performance and compression numbers?<br>
<div><div><br>
> diff -r d56b2466c044 -r 9e26bef14543 source/encoder/analysis.cpp<br>
> --- a/source/encoder/analysis.cpp Wed Aug 12 18:12:20 2015 +0530<br>
> +++ b/source/encoder/analysis.cpp Fri Aug 14 13:47:08 2015 +0530<br>
> @@ -505,16 +505,82 @@<br>
><br>
> X265_CHECK(m_param->rdLevel >= 2, "compressInterCU_dist does not support RD 0 or 1\n");<br>
><br>
> + PMODE pmode(*this, cuGeom);<br>
> +<br>
> + if (mightNotSplit && depth >= minDepth)<br>
> + {<br>
> + /* Initialize all prediction CUs based on parentCTU */<br>
> + md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> + md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> +<br>
> + if (m_param->rdLevel <= 4)<br>
> + checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);<br>
> + else<br>
> + checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom, false);<br>
> + }<br>
> +<br>
> + bool bNoSplit = false;<br>
> + if (md.bestMode)<br>
> + {<br>
> + bNoSplit = md.bestMode->cu.isSkipped(0);<br>
> + if (mightSplit && depth && depth >= minDepth && !bNoSplit && m_param->rdLevel <= 4)<br>
> + bNoSplit = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);<br>
> + }<br>
> +<br>
> + if (mightSplit && !bNoSplit)<br>
> + {<br>
> + Mode* splitPred = &md.pred[PRED_SPLIT];<br>
> + splitPred->initCosts();<br>
> + CUData* splitCU = &splitPred->cu;<br>
> + splitCU->initSubCU(parentCTU, cuGeom, qp);<br>
> +<br>
> + uint32_t nextDepth = depth + 1;<br>
> + ModeDepth& nd = m_modeDepth[nextDepth];<br>
> + invalidateContexts(nextDepth);<br>
> + Entropy* nextContext = &m_rqt[depth].cur;<br>
> + int nextQP = qp;<br>
> +<br>
> + for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)<br>
> + {<br>
> + const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + subPartIdx);<br>
> + if (childGeom.flags & CUGeom::PRESENT)<br>
> + {<br>
> + m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);<br>
> + m_rqt[nextDepth].cur.load(*nextContext);<br>
> +<br>
> + if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)<br>
> + nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));<br>
> +<br>
> + compressInterCU_dist(parentCTU, childGeom, nextQP);<br>
> +<br>
> + // Save best CU and pred data for this sub CU<br>
> + splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);<br>
> + splitPred->addSubCosts(*nd.bestMode);<br>
> +<br>
> + nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv, childGeom.numPartitions * subPartIdx);<br>
> + nextContext = &nd.bestMode->contexts;<br>
> + }<br>
> + else<br>
> + splitCU->setEmptyPart(childGeom, subPartIdx);<br>
> + }<br>
> + nextContext->store(splitPred->contexts);<br>
> +<br>
> + if (mightNotSplit)<br>
> + addSplitFlagCost(*splitPred, cuGeom.depth);<br>
> + else<br>
> + updateModeCost(*splitPred);<br>
> +<br>
> + checkDQPForSplitPred(*splitPred, cuGeom);<br>
> + }<br>
> +<br>
> if (mightNotSplit && depth >= minDepth)<br>
> {<br>
> int bTryAmp = m_slice->m_sps->maxAMPDepth > depth;<br>
> int bTryIntra = m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames;<br>
><br>
> - PMODE pmode(*this, cuGeom);<br>
> + if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)<br>
> + setLambdaFromQP(parentCTU, qp);<br>
><br>
> - /* Initialize all prediction CUs based on parentCTU */<br>
> - md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> - md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> if (bTryIntra)<br>
> {<br>
> md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> @@ -548,8 +614,6 @@<br>
><br>
> if (m_param->rdLevel <= 4)<br>
> {<br>
> - checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);<br>
> -<br>
> {<br>
> ProfileCUScope(parentCTU, pmodeBlockTime, countPModeMasters);<br>
> pmode.waitForExit();<br>
> @@ -632,14 +696,13 @@<br>
> }<br>
> else<br>
> {<br>
> - checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom, false);<br>
> {<br>
> ProfileCUScope(parentCTU, pmodeBlockTime, countPModeMasters);<br>
> pmode.waitForExit();<br>
> }<br>
><br>
> checkBestMode(md.pred[PRED_2Nx2N], depth);<br>
> - if (m_slice->m_sliceType == B_SLICE)<br>
> + if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost < MAX_INT64)<br>
> checkBestMode(md.pred[PRED_BIDIR], depth);<br>
><br>
> if (m_param->bEnableRectInter)<br>
> @@ -664,14 +727,6 @@<br>
> }<br>
> }<br>
><br>
> - if (md.bestMode->rdCost == MAX_INT64 && !bTryIntra)<br>
> - {<br>
> - md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);<br>
> - checkIntraInInter(md.pred[PRED_INTRA], cuGeom);<br>
> - encodeIntraInInter(md.pred[PRED_INTRA], cuGeom);<br>
> - checkBestMode(md.pred[PRED_INTRA], depth);<br>
> - }<br>
> -<br>
> if (m_bTryLossless)<br>
> tryLossless(cuGeom);<br>
><br>
> @@ -679,60 +734,9 @@<br>
> addSplitFlagCost(*md.bestMode, cuGeom.depth);<br>
> }<br>
><br>
> - bool bNoSplit = false;<br>
> - if (md.bestMode)<br>
> - {<br>
> - bNoSplit = md.bestMode->cu.isSkipped(0);<br>
> - if (mightSplit && depth && depth >= minDepth && !bNoSplit && m_param->rdLevel <= 4)<br>
> - bNoSplit = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);<br>
> - }<br>
> -<br>
> - if (mightSplit && !bNoSplit)<br>
> - {<br>
> - Mode* splitPred = &md.pred[PRED_SPLIT];<br>
> - splitPred->initCosts();<br>
> - CUData* splitCU = &splitPred->cu;<br>
> - splitCU->initSubCU(parentCTU, cuGeom, qp);<br>
> -<br>
> - uint32_t nextDepth = depth + 1;<br>
> - ModeDepth& nd = m_modeDepth[nextDepth];<br>
> - invalidateContexts(nextDepth);<br>
> - Entropy* nextContext = &m_rqt[depth].cur;<br>
> - int nextQP = qp;<br>
> -<br>
> - for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)<br>
> - {<br>
> - const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + subPartIdx);<br>
> - if (childGeom.flags & CUGeom::PRESENT)<br>
> - {<br>
> - m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);<br>
> - m_rqt[nextDepth].cur.load(*nextContext);<br>
> -<br>
> - if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)<br>
> - nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));<br>
> -<br>
> - compressInterCU_dist(parentCTU, childGeom, nextQP);<br>
> -<br>
> - // Save best CU and pred data for this sub CU<br>
> - splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);<br>
> - splitPred->addSubCosts(*nd.bestMode);<br>
> -<br>
> - nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv, childGeom.numPartitions * subPartIdx);<br>
> - nextContext = &nd.bestMode->contexts;<br>
> - }<br>
> - else<br>
> - splitCU->setEmptyPart(childGeom, subPartIdx);<br>
> - }<br>
> - nextContext->store(splitPred->contexts);<br>
> -<br>
> - if (mightNotSplit)<br>
> - addSplitFlagCost(*splitPred, cuGeom.depth);<br>
> - else<br>
> - updateModeCost(*splitPred);<br>
> -<br>
> - checkDQPForSplitPred(*splitPred, cuGeom);<br>
> - checkBestMode(*splitPred, depth);<br>
> - }<br>
> + /* compare split RD cost against best cost */<br>
> + if (mightSplit && !bNoSplit)<br>
> + checkBestMode(md.pred[PRED_SPLIT], depth);<br>
><br>
> if (mightNotSplit)<br>
> {<br>
> @@ -746,8 +750,7 @@<br>
><br>
> /* Copy best data to encData CTU and recon */<br>
> md.bestMode->cu.copyToPic(depth);<br>
> - if (md.bestMode != &md.pred[PRED_SPLIT])<br>
> - md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, cuAddr, cuGeom.absPartIdx);<br>
> + md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, cuAddr, cuGeom.absPartIdx);<br>
> }<br>
><br>
> uint32_t Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)<br>
</div></div>> _______________________________________________<br>
> x265-devel mailing list<br>
> <a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
> <a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<span><font color="#888888"><br>
--<br>
Steve Borho<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>