<div dir="ltr"><div>Below are the performance testing on Haswell with and without limiting rect/amp analysis mode in veryslow preset.</div><div><br></div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc</div><div>encoded 504 frames in 223.08s (2.26 fps), 3596.14 kb/s, Avg QP:37.29, Global PSNR: 30.707, SSIM Mean Y: 0.8688587 ( 8.823 dB)</div><div><br></div><div><b>After</b></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-rect-amp 1</div><div>encoded 504 frames in 186.35s (2.70 fps), 3610.14 kb/s, Avg QP:37.35, Global PSNR: 30.692, SSIM Mean Y: 0.8687821 ( 8.820 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1</div><div>encoded 504 frames in 188.32s (2.68 fps), 3604.27 kb/s, Avg QP:37.31, Global PSNR: 30.712, SSIM Mean Y: 0.8689656 ( 8.826 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1 --limit-rect-amp 1</div><div>encoded 504 frames in 165.63s (3.04 fps), 3610.51 kb/s, Avg QP:37.34, Global PSNR: 30.691, SSIM Mean Y: 0.8686912 ( 8.817 dB)</div><div><br></div><div>----------------------------------------------------------------------------------------------------------------------------------------------</div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc</div><div>encoded 500 frames in 795.40s (0.63 fps), 9513.58 kb/s, Avg QP:37.92, Global PSNR: 30.459, SSIM Mean Y: 0.8214006 ( 7.481 dB)</div><div><br></div><div><b>After</b></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-rect-amp 1</div><div>encoded 500 frames in 556.86s (0.90 fps), 9553.70 kb/s, Avg QP:37.92, Global PSNR: 30.458, SSIM Mean Y: 0.8214283 ( 7.482 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1</div><div>encoded 500 frames in 625.53s (0.80 fps), 9518.09 kb/s, Avg QP:37.91, Global PSNR: 30.457, SSIM Mean Y: 0.8213568 ( 7.480 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --limit-refs 1 --limit-rect-amp 1</div><div>encoded 500 frames in 513.12s (0.97 fps), 9564.23 kb/s, Avg QP:37.92, Global PSNR: 30.457, SSIM Mean Y: 0.8213727 ( 7.481 dB)</div><div><br></div><div>---------------------------------------------------------------------------------------------------------------------------------------------------</div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000</div><div>encoded 504 frames in 273.06s (1.85 fps), 5097.33 kb/s, Avg QP:35.53, Global PSNR: 31.691, SSIM Mean Y: 0.8935050 ( 9.727 dB)</div><div><br></div><div><b>After</b></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-refs 1</div><div>encoded 504 frames in 231.13s (2.18 fps), 5094.89 kb/s, Avg QP:35.54, Global PSNR: 31.687, SSIM Mean Y: 0.8933111 ( 9.719 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-rect-amp 1</div><div>encoded 504 frames in 228.21s (2.21 fps), 5099.24 kb/s, Avg QP:35.60, Global PSNR: 31.671, SSIM Mean Y: 0.8932938 ( 9.718 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\parkrun_ter_720p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-rect-amp 1 --limit-refs 1</div><div>encoded 504 frames in 199.34s (2.53 fps), 5098.16 kb/s, Avg QP:35.61, Global PSNR: 31.667, SSIM Mean Y: 0.8931659 ( 9.713 dB)</div><div><br></div><div><br></div><div>----------------------------------------------------------------------------------------------------------------------------------------------------</div><div><b>Before</b></div><div>D:\ashok>x265_b.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test_b.hevc --bitrate 6000</div><div>encoded 500 frames in 659.57s (0.76 fps), 6054.60 kb/s, Avg QP:40.14, Global PSNR: 29.542, SSIM Mean Y: 0.7802421 ( 6.581 dB)</div><div><br></div><div><b>After</b></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-refs 1</div><div>encoded 500 frames in 524.01s (0.95 fps), 6053.17 kb/s, Avg QP:40.15, Global PSNR: 29.537, SSIM Mean Y: 0.7800589 ( 6.577 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-rect-amp 1</div><div>encoded 500 frames in 469.41s (1.07 fps), 6056.95 kb/s, Avg QP:40.18, Global PSNR: 29.535, SSIM Mean Y: 0.7798592 ( 6.573 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-rect-amp 1 --limit-refs 1</div><div>encoded 500 frames in 433.19s (1.15 fps), 6058.14 kb/s, Avg QP:40.19, Global PSNR: 29.529, SSIM Mean Y: 0.7796667 ( 6.569 dB)</div><div><br></div><div>D:\ashok>x265.exe --input \\HEVC-TEST-2\testsequences\ducks_take_off_1080p50.y4m --preset veryslow --hash=1 --no-info --psnr --ssim -o test.hevc --bitrate 6000 --limit-rect-amp 1 --limit-refs 3</div><div>encoded 500 frames in 340.68s (1.47 fps), 6057.80 kb/s, Avg QP:40.19, Global PSNR: 29.519, SSIM Mean Y: 0.7789705 ( 6.555 dB)</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 15, 2015 at 8:31 PM, <span dir="ltr"><<a href="mailto:ashok@multicorewareinc.com" target="_blank">ashok@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>
# User Ashok Kumar Mishra<<a href="mailto:ashok@multicorewareinc.com">ashok@multicorewareinc.com</a>><br>
# Date 1444824873 -19800<br>
# Wed Oct 14 17:44:33 2015 +0530<br>
# Node ID f3963e7e75b8dcb599250c082357e08fd32191a5<br>
# Parent b6156a08b1def3584647f26096866c1a0c11e54a<br>
analysis: avoid redundant rect/amp mode analysis based on split block rdCost and mvCost for rd-5/6<br>
The analysis order for rect modes(first Nx2N, then 2NxN) is changed based on the rd cost of split blocks<br>
to get better PSNR and SSIM.<br>
<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/common/param.cpp<br>
--- a/source/common/param.cpp Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/common/param.cpp Wed Oct 14 17:44:33 2015 +0530<br>
@@ -160,6 +160,7 @@<br>
param->searchRange = 57;<br>
param->maxNumMergeCand = 2;<br>
param->limitReferences = 0;<br>
+ param->limitRectAmp = 0;<br>
param->bEnableWeightedPred = 1;<br>
param->bEnableWeightedBiPred = 0;<br>
param->bEnableEarlySkip = 0;<br>
@@ -648,6 +649,7 @@<br>
}<br>
OPT("ref") p->maxNumReferences = atoi(value);<br>
OPT("limit-refs") p->limitReferences = atoi(value);<br>
+ OPT("limit-rect-amp") p->limitRectAmp = atoi(value);<br>
OPT("weightp") p->bEnableWeightedPred = atobool(value);<br>
OPT("weightb") p->bEnableWeightedBiPred = atobool(value);<br>
OPT("cbqpoffs") p->cbQpOffset = atoi(value);<br>
@@ -1041,6 +1043,8 @@<br>
"subme must be greater than or equal to 0");<br>
CHECK(param->limitReferences > 3,<br>
"limitReferences must be 0, 1, 2 or 3");<br>
+ CHECK(param->limitRectAmp > 1,<br>
+ "limitRectAmp must be 0, 1");<br>
CHECK(param->frameNumThreads < 0 || param->frameNumThreads > X265_MAX_FRAME_THREADS,<br>
"frameNumThreads (--frame-threads) must be [0 .. X265_MAX_FRAME_THREADS)");<br>
CHECK(param->cbQpOffset < -12, "Min. Chroma Cb QP Offset is -12");<br>
@@ -1434,6 +1438,7 @@<br>
s += sprintf(s, " b-adapt=%d", p->bFrameAdaptive);<br>
s += sprintf(s, " ref=%d", p->maxNumReferences);<br>
s += sprintf(s, " limit-refs=%d", p->limitReferences);<br>
+ s += sprintf(s, " limit-rect-amp=%d", p->limitRectAmp);<br>
BOOL(p->bEnableWeightedPred, "weightp");<br>
BOOL(p->bEnableWeightedBiPred, "weightb");<br>
s += sprintf(s, " aq-mode=%d", p->rc.aqMode);<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/analysis.cpp<br>
--- a/source/encoder/analysis.cpp Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/encoder/analysis.cpp Wed Oct 14 17:44:33 2015 +0530<br>
@@ -1172,7 +1172,7 @@<br>
return refMask;<br>
}<br>
<br>
-uint32_t Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp)<br>
+SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp)<br>
{<br>
uint32_t depth = cuGeom.depth;<br>
ModeDepth& md = m_modeDepth[depth];<br>
@@ -1207,7 +1207,13 @@<br>
<br>
bool foundSkip = false;<br>
bool splitIntra = true;<br>
- uint32_t splitRefs[4] = { 0, 0, 0, 0 };<br>
+<br>
+ SplitData splitData[4];<br>
+ splitData[0].initSplitCUData();<br>
+ splitData[1].initSplitCUData();<br>
+ splitData[2].initSplitCUData();<br>
+ splitData[3].initSplitCUData();<br>
+<br>
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs */<br>
if (mightNotSplit)<br>
{<br>
@@ -1244,7 +1250,7 @@<br>
if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)<br>
nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));<br>
<br>
- splitRefs[subPartIdx] = compressInterCU_rd5_6(parentCTU, childGeom, zOrder, nextQP);<br>
+ splitData[subPartIdx] = compressInterCU_rd5_6(parentCTU, childGeom, zOrder, nextQP);<br>
<br>
// Save best CU and pred data for this sub CU<br>
splitIntra |= nd.bestMode->cu.isIntra(0);<br>
@@ -1271,7 +1277,7 @@<br>
/* Split CUs<br>
* 0 1<br>
* 2 3 */<br>
- uint32_t allSplitRefs = splitRefs[0] | splitRefs[1] | splitRefs[2] | splitRefs[3];<br>
+ uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;<br>
/* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */<br>
if (mightNotSplit)<br>
{<br>
@@ -1290,7 +1296,7 @@<br>
{<br>
CUData& cu = md.pred[PRED_2Nx2N].cu;<br>
uint32_t refMask = cu.getBestRefIdx(0);<br>
- allSplitRefs = splitRefs[0] = splitRefs[1] = splitRefs[2] = splitRefs[3] = refMask;<br>
+ allSplitRefs = splitData[0].splitRefs = splitData[1].splitRefs = splitData[2].splitRefs = splitData[3].splitRefs = refMask;<br>
}<br>
<br>
if (m_slice->m_sliceType == B_SLICE)<br>
@@ -1306,22 +1312,80 @@<br>
<br>
if (m_param->bEnableRectInter)<br>
{<br>
- refMasks[0] = splitRefs[0] | splitRefs[2]; /* left */<br>
- refMasks[1] = splitRefs[1] | splitRefs[3]; /* right */<br>
- md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_Nx2N], cuGeom, SIZE_Nx2N, refMasks);<br>
- checkBestMode(md.pred[PRED_Nx2N], cuGeom.depth);<br>
+ uint64_t splitCost = splitData[0].rdCost + splitData[1].rdCost + splitData[2].rdCost + splitData[3].rdCost;<br>
+ ModeDepth& md = m_modeDepth[depth];<br>
+ uint32_t threshold_2NxN, threshold_Nx2N;<br>
<br>
- refMasks[0] = splitRefs[0] | splitRefs[1]; /* top */<br>
- refMasks[1] = splitRefs[2] | splitRefs[3]; /* bot */<br>
- md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);<br>
- checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);<br>
+ if (m_slice->m_sliceType == P_SLICE)<br>
+ {<br>
+ threshold_2NxN = splitData[0].mvCost[0] + splitData[1].mvCost[0];<br>
+ threshold_Nx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];<br>
+ }<br>
+ else<br>
+ {<br>
+ threshold_2NxN = (splitData[0].mvCost[0] + splitData[1].mvCost[0]<br>
+ + splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;<br>
+ threshold_Nx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]<br>
+ + splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;<br>
+ }<br>
+<br>
+ int try_2NxN_first = threshold_2NxN < threshold_Nx2N;<br>
+ if (try_2NxN_first && splitCost < md.bestMode->rdCost + threshold_2NxN)<br>
+ {<br>
+ refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */<br>
+ refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */<br>
+ md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);<br>
+ checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);<br>
+ }<br>
+<br>
+ if (splitCost < md.bestMode->rdCost + threshold_Nx2N)<br>
+ {<br>
+ refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* left */<br>
+ refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* right */<br>
+ md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_Nx2N], cuGeom, SIZE_Nx2N, refMasks);<br>
+ checkBestMode(md.pred[PRED_Nx2N], cuGeom.depth);<br>
+ }<br>
+<br>
+ if (!try_2NxN_first && splitCost < md.bestMode->rdCost + threshold_2NxN)<br>
+ {<br>
+ refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */<br>
+ refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */<br>
+ md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);<br>
+ checkBestMode(md.pred[PRED_2NxN], cuGeom.depth);<br>
+ }<br>
}<br>
<br>
// Try AMP (SIZE_2NxnU, SIZE_2NxnD, SIZE_nLx2N, SIZE_nRx2N)<br>
if (m_slice->m_sps->maxAMPDepth > depth)<br>
{<br>
+ uint64_t splitCost = splitData[0].rdCost + splitData[1].rdCost + splitData[2].rdCost + splitData[3].rdCost;<br>
+ ModeDepth& md = m_modeDepth[depth];<br>
+ uint32_t threshold_2NxnU, threshold_2NxnD, threshold_nLx2N, threshold_nRx2N;<br>
+<br>
+ if (m_slice->m_sliceType == P_SLICE)<br>
+ {<br>
+ threshold_2NxnU = splitData[0].mvCost[0] + splitData[1].mvCost[0];<br>
+ threshold_2NxnD = splitData[2].mvCost[0] + splitData[3].mvCost[0];<br>
+<br>
+ threshold_nLx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];<br>
+ threshold_nRx2N = splitData[1].mvCost[0] + splitData[3].mvCost[0];<br>
+ }<br>
+ else<br>
+ {<br>
+ threshold_2NxnU = (splitData[0].mvCost[0] + splitData[1].mvCost[0]<br>
+ + splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;<br>
+ threshold_2NxnD = (splitData[2].mvCost[0] + splitData[3].mvCost[0]<br>
+ + splitData[2].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;<br>
+<br>
+ threshold_nLx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]<br>
+ + splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;<br>
+ threshold_nRx2N = (splitData[1].mvCost[0] + splitData[3].mvCost[0]<br>
+ + splitData[1].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;<br>
+ }<br>
+<br>
bool bHor = false, bVer = false;<br>
if (md.bestMode->cu.m_partSize[0] == SIZE_2NxN)<br>
bHor = true;<br>
@@ -1335,31 +1399,64 @@<br>
<br>
if (bHor)<br>
{<br>
- refMasks[0] = splitRefs[0] | splitRefs[1]; /* 25% top */<br>
- refMasks[1] = allSplitRefs; /* 75% bot */<br>
- md.pred[PRED_2NxnU].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_2NxnU], cuGeom, SIZE_2NxnU, refMasks);<br>
- checkBestMode(md.pred[PRED_2NxnU], cuGeom.depth);<br>
+ int try_2NxnD_first = threshold_2NxnD < threshold_2NxnU;<br>
+ if (try_2NxnD_first && splitCost < md.bestMode->rdCost + threshold_2NxnD)<br>
+ {<br>
+ refMasks[0] = allSplitRefs; /* 75% top */<br>
+ refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */<br>
+ md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);<br>
+ checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);<br>
+ }<br>
<br>
- refMasks[0] = allSplitRefs; /* 75% top */<br>
- refMasks[1] = splitRefs[2] | splitRefs[3]; /* 25% bot */<br>
- md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);<br>
- checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);<br>
+ if (splitCost < md.bestMode->rdCost + threshold_2NxnU)<br>
+ {<br>
+ refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* 25% top */<br>
+ refMasks[1] = allSplitRefs; /* 75% bot */<br>
+ md.pred[PRED_2NxnU].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_2NxnU], cuGeom, SIZE_2NxnU, refMasks);<br>
+ checkBestMode(md.pred[PRED_2NxnU], cuGeom.depth);<br>
+ }<br>
+<br>
+ if (!try_2NxnD_first && splitCost < md.bestMode->rdCost + threshold_2NxnD)<br>
+ {<br>
+ refMasks[0] = allSplitRefs; /* 75% top */<br>
+ refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */<br>
+ md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);<br>
+ checkBestMode(md.pred[PRED_2NxnD], cuGeom.depth);<br>
+ }<br>
}<br>
+<br>
if (bVer)<br>
{<br>
- refMasks[0] = splitRefs[0] | splitRefs[2]; /* 25% left */<br>
- refMasks[1] = allSplitRefs; /* 75% right */<br>
- md.pred[PRED_nLx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_nLx2N], cuGeom, SIZE_nLx2N, refMasks);<br>
- checkBestMode(md.pred[PRED_nLx2N], cuGeom.depth);<br>
+ int try_nRx2N_first = threshold_nRx2N < threshold_nLx2N;<br>
+ if (try_nRx2N_first && splitCost < md.bestMode->rdCost + threshold_nRx2N)<br>
+ {<br>
+ refMasks[0] = allSplitRefs; /* 75% left */<br>
+ refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */<br>
+ md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);<br>
+ checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);<br>
+ }<br>
<br>
- refMasks[0] = allSplitRefs; /* 75% left */<br>
- refMasks[1] = splitRefs[1] | splitRefs[3]; /* 25% right */<br>
- md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
- checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);<br>
- checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);<br>
+ if (splitCost < md.bestMode->rdCost + threshold_nLx2N)<br>
+ {<br>
+ refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* 25% left */<br>
+ refMasks[1] = allSplitRefs; /* 75% right */<br>
+ md.pred[PRED_nLx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_nLx2N], cuGeom, SIZE_nLx2N, refMasks);<br>
+ checkBestMode(md.pred[PRED_nLx2N], cuGeom.depth);<br>
+ }<br>
+<br>
+ if (!try_nRx2N_first && splitCost < md.bestMode->rdCost + threshold_nRx2N)<br>
+ {<br>
+ refMasks[0] = allSplitRefs; /* 75% left */<br>
+ refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */<br>
+ md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);<br>
+ checkInter_rd5_6(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);<br>
+ checkBestMode(md.pred[PRED_nRx2N], cuGeom.depth);<br>
+ }<br>
}<br>
}<br>
<br>
@@ -1398,26 +1495,39 @@<br>
checkBestMode(md.pred[PRED_SPLIT], depth);<br>
<br>
/* determine which motion references the parent CU should search */<br>
- uint32_t refMask;<br>
+ SplitData splitCUData;<br>
if (!(m_param->limitReferences & X265_REF_LIMIT_DEPTH))<br>
- refMask = 0;<br>
+ splitCUData.splitRefs = 0;<br>
else if (md.bestMode == &md.pred[PRED_SPLIT])<br>
- refMask = allSplitRefs;<br>
+ splitCUData.splitRefs = allSplitRefs;<br>
else<br>
{<br>
/* use best merge/inter mode, in case of intra use 2Nx2N inter references */<br>
CUData& cu = md.bestMode->cu.isIntra(0) ? md.pred[PRED_2Nx2N].cu : md.bestMode->cu;<br>
uint32_t numPU = cu.getNumPartInter(0);<br>
- refMask = 0;<br>
+ splitCUData.splitRefs = 0;<br>
for (uint32_t puIdx = 0, subPartIdx = 0; puIdx < numPU; puIdx++, subPartIdx += cu.getPUOffset(puIdx, 0))<br>
- refMask |= cu.getBestRefIdx(subPartIdx);<br>
+ splitCUData.splitRefs |= cu.getBestRefIdx(subPartIdx);<br>
+ }<br>
+<br>
+ if (!m_param->limitRectAmp)<br>
+ {<br>
+ splitCUData.mvCost[0] = 0; // L0<br>
+ splitCUData.mvCost[1] = 0; // L1<br>
+ splitCUData.rdCost = 0;<br>
+ }<br>
+ else<br>
+ {<br>
+ splitCUData.mvCost[0] = md.pred[PRED_2Nx2N].bestME[0][0].mvCost; // L0<br>
+ splitCUData.mvCost[1] = md.pred[PRED_2Nx2N].bestME[0][1].mvCost; // L1<br>
+ splitCUData.rdCost = md.pred[PRED_2Nx2N].rdCost;<br>
}<br>
<br>
/* Copy best data to encData CTU and recon */<br>
md.bestMode->cu.copyToPic(depth);<br>
md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, parentCTU.m_cuAddr, cuGeom.absPartIdx);<br>
<br>
- return refMask;<br>
+ return splitCUData;<br>
}<br>
<br>
/* sets md.bestMode if a valid merge candidate is found, else leaves it NULL */<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/analysis.h<br>
--- a/source/encoder/analysis.h Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/encoder/analysis.h Wed Oct 14 17:44:33 2015 +0530<br>
@@ -41,6 +41,21 @@<br>
<br>
class Entropy;<br>
<br>
+struct SplitData<br>
+{<br>
+ uint32_t splitRefs;<br>
+ uint32_t mvCost[2];<br>
+ uint64_t rdCost;<br>
+<br>
+ void initSplitCUData()<br>
+ {<br>
+ splitRefs = 0;<br>
+ mvCost[0] = 0; // L0<br>
+ mvCost[1] = 0; // L1<br>
+ rdCost = 0;<br>
+ }<br>
+};<br>
+<br>
class Analysis : public Search<br>
{<br>
public:<br>
@@ -117,7 +132,7 @@<br>
/* full analysis for a P or B slice CU */<br>
uint32_t compressInterCU_dist(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);<br>
uint32_t compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);<br>
- uint32_t compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp);<br>
+ SplitData compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp);<br>
<br>
/* measure merge and skip */<br>
void checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGeom);<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/search.cpp<br>
--- a/source/encoder/search.cpp Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/encoder/search.cpp Wed Oct 14 17:44:33 2015 +0530<br>
@@ -2186,19 +2186,21 @@<br>
<br>
/* Get total cost of partition, but only include MV bit cost once */<br>
bits += m_me.bitcost(outmv);<br>
- uint32_t cost = (satdCost - m_me.mvcost(outmv)) + m_rdCost.getCost(bits);<br>
+ uint32_t mvCost = m_me.mvcost(outmv);<br>
+ uint32_t cost = (satdCost - mvCost) + m_rdCost.getCost(bits);<br>
<br>
/* Refine MVP selection, updates: mvpIdx, bits, cost */<br>
mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);<br>
<br>
if (cost < bestME[list].cost)<br>
{<br>
- bestME[list].mv = outmv;<br>
- bestME[list].mvp = mvp;<br>
- bestME[list].mvpIdx = mvpIdx;<br>
- bestME[list].ref = ref;<br>
- bestME[list].cost = cost;<br>
- bestME[list].bits = bits;<br>
+ bestME[list].mv = outmv;<br>
+ bestME[list].mvp = mvp;<br>
+ bestME[list].mvpIdx = mvpIdx;<br>
+ bestME[list].ref = ref;<br>
+ bestME[list].cost = cost;<br>
+ bestME[list].bits = bits;<br>
+ bestME[list].mvCost = mvCost;<br>
}<br>
}<br>
/* the second list ref bits start at bit 16 */<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/encoder/search.h<br>
--- a/source/encoder/search.h Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/encoder/search.h Wed Oct 14 17:44:33 2015 +0530<br>
@@ -85,8 +85,9 @@<br>
MV mvp;<br>
int mvpIdx;<br>
int ref;<br>
+ int bits;<br>
+ uint32_t mvCost;<br>
uint32_t cost;<br>
- int bits;<br>
};<br>
<br>
struct Mode<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/x265.h<br>
--- a/source/x265.h Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/x265.h Wed Oct 14 17:44:33 2015 +0530<br>
@@ -822,6 +822,10 @@<br>
* 4 split CUs at the next lower CU depth. The two flags may be combined */<br>
uint32_t limitReferences;<br>
<br>
+ /* Limit rectangular and asymetric motion partitions based on rdCost and mvCost<br>
+ of the 4 split CUs at the next lower CU depth*/<br>
+ uint32_t limitRectAmp;<br>
+<br>
/* ME search method (DIA, HEX, UMH, STAR, FULL). The search patterns<br>
* (methods) are sorted in increasing complexity, with diamond being the<br>
* simplest and fastest and full being the slowest. DIA, HEX, and UMH were<br>
diff -r b6156a08b1de -r f3963e7e75b8 source/x265cli.h<br>
--- a/source/x265cli.h Fri Oct 09 20:45:59 2015 +0530<br>
+++ b/source/x265cli.h Wed Oct 14 17:44:33 2015 +0530<br>
@@ -126,6 +126,7 @@<br>
{ "b-pyramid", no_argument, NULL, 0 },<br>
{ "ref", required_argument, NULL, 0 },<br>
{ "limit-refs", required_argument, NULL, 0 },<br>
+ { "limit-rect-amp", required_argument, NULL, 0 },<br>
{ "no-weightp", no_argument, NULL, 0 },<br>
{ "weightp", no_argument, NULL, 'w' },<br>
{ "no-weightb", no_argument, NULL, 0 },<br>
@@ -310,12 +311,13 @@<br>
H0("\nTemporal / motion search options:\n");<br>
H0(" --max-merge <1..5> Maximum number of merge candidates. Default %d\n", param->maxNumMergeCand);<br>
H0(" --ref <integer> max number of L0 references to be allowed (1 .. 16) Default %d\n", param->maxNumReferences);<br>
- H0(" --limit-refs <0|1|2|3> limit references per depth (1) or CU (2) or both (3). Default %d\n", param->limitReferences);<br>
+ H0(" --limit-refs <0|1|2|3> Limit references per depth (1) or CU (2) or both (3). Default %d\n", param->limitReferences);<br>
H0(" --me <string> Motion search method dia hex umh star full. Default %d\n", param->searchMethod);<br>
H0("-m/--subme <integer> Amount of subpel refinement to perform (0:least .. 7:most). Default %d \n", param->subpelRefine);<br>
H0(" --merange <integer> Motion search range. Default %d\n", param->searchRange);<br>
H0(" --[no-]rect Enable rectangular motion partitions Nx2N and 2NxN. Default %s\n", OPT(param->bEnableRectInter));<br>
H0(" --[no-]amp Enable asymmetric motion partitions, requires --rect. Default %s\n", OPT(param->bEnableAMP));<br>
+ H0(" --limit-rect-amp <0|1> Limit rectangular and asymetirc motion partitions. Default %d\n", param->limitRectAmp);<br>
H1(" --[no-]temporal-mvp Enable temporal MV predictors. Default %s\n", OPT(param->bEnableTemporalMvp));<br>
H0("\nSpatial / intra options:\n");<br>
H0(" --[no-]strong-intra-smoothing Enable strong intra smoothing for 32x32 blocks. Default %s\n", OPT(param->bEnableStrongIntraSmoothing));<br>
</blockquote></div><br></div>