<div dir="ltr">I tested this patch with few clips with below cli<div>I checked with differnt values of psy-rd 0.5, 1, 1.5, 2<br><div><br><div>x265.exe 720p50_parkrun_ter.y4m -o 720p50_parkrun_ter_medium_psy-rd.hevc --bitrate 3000 --psy-rd 0.5<br><div><br><div>1. NebutaFestival -- I don't see any difference in visual quality with and without this patch</div><div>2. Duckstakeoff -- the old psy-rd implementation is better than new implementation</div><div>3. parkrun -- in some places the old implementation is visually better and in some places new implementation is better</div></div></div></div></div><div><br></div><div>Regards</div><div>Sumalatha</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 20, 2014 at 11:07 PM, Deepthi Nandakumar <span dir="ltr"><<a href="mailto:deepthi@multicorewareinc.com" target="_blank">deepthi@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Thanks Steve - will clean it up before pushing.<br><br></div>I wanted some feedback on the visual quality - I couldnt see any improvement, and sometimes felt the existing psy-rd implementation was marginally better. Would be good if Sumalatha/anyone else could test, and see if it improves psy-rd.<br></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 20, 2014 at 10:54 PM, Steve Borho <span dir="ltr"><<a href="mailto:steve@borho.org" target="_blank">steve@borho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 10/20, <a href="mailto:deepthi@multicorewareinc.com" target="_blank">deepthi@multicorewareinc.com</a> wrote:<br>
> # HG changeset patch<br>
> # User Deepthi Nandakumar <<a href="mailto:deepthi@multicorewareinc.com" target="_blank">deepthi@multicorewareinc.com</a>><br>
> # Date 1413805233 -19800<br>
> # Mon Oct 20 17:10:33 2014 +0530<br>
> # Node ID c9b80a61687aea02108cb41f26f19f1408b00d2f<br>
> # Parent 7eab67ffff81a44cc67c388dc4fcae2468979fae<br>
> search: change psy-rd energy and cost measurement.<br>
><br>
> Psy-rd now calculates distortion and psyenergy as a function of source and recon,<br>
> as opposed to original residual and reconstructed residual.<br>
<br>
</span>the logic looks sound, but I have some implementation nits<br>
<span><br>
> diff -r 7eab67ffff81 -r c9b80a61687a source/encoder/search.cpp<br>
> --- a/source/encoder/search.cpp Mon Oct 20 15:37:50 2014 +0530<br>
> +++ b/source/encoder/search.cpp Mon Oct 20 17:10:33 2014 +0530<br>
> @@ -2719,6 +2719,7 @@<br>
> {<br>
> TComDataCU* cu = &<a href="http://mode.cu" target="_blank">mode.cu</a>;<br>
> const Yuv* fencYuv = mode.fencYuv;<br>
> + const Yuv* predYuv = &mode.predYuv;<br>
><br>
> X265_CHECK(cu->m_depth[0] == cu->m_depth[absPartIdx], "depth not matching\n");<br>
> const uint32_t trMode = depth - cu->m_depth[0];<br>
> @@ -2796,6 +2797,7 @@<br>
> m_entropyCoder.estBit(m_entropyCoder.m_estBitsSbac, log2TrSize, true);<br>
><br>
> pixel *fenc = const_cast<pixel*>(fencYuv->getLumaAddr(absPartIdx));<br>
> + pixel *pred = const_cast<pixel*>(predYuv->getLumaAddr(absPartIdx));<br>
<br>
</span>predYuv isn't const like fencYuv, so you don't need the casts.<br>
<div><div><br>
> int16_t *resi = resiYuv->getLumaAddr(absPartIdx);<br>
> numSigY = m_quant.transformNxN(cu, fenc, fencYuv->m_size, resi, resiYuv->m_size, coeffCurY, log2TrSize, TEXT_LUMA, absPartIdx, false);<br>
><br>
> @@ -2863,10 +2865,18 @@<br>
> }<br>
><br>
> X265_CHECK(log2TrSize <= 5, "log2TrSize is too large\n");<br>
> - uint32_t distY = primitives.ssd_s[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size);<br>
> + uint32_t distY;<br>
> uint32_t psyEnergyY = 0;<br>
> + /* When psy-rd is enabled, distortion and psyEnergy are measured against source, recon */<br>
> if (m_rdCost.m_psyRd)<br>
> - psyEnergyY = m_rdCost.psyCost(partSize, resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, (int16_t*)zeroShort, 0);<br>
> + {<br>
> + fenc = const_cast<pixel*>(fencYuv->getLumaAddr(absPartIdx));<br>
> + pred = const_cast<pixel*>(predYuv->getLumaAddr(absPartIdx));<br>
> + distY = primitives.sse_pp[partSize](fenc, fencYuv->m_size, pred, predYuv->m_size);<br>
> + psyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, pred, predYuv->m_size);<br>
> + }<br>
> + else<br>
> + distY = primitives.ssd_s[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size);<br>
><br>
> int16_t *curResiY = m_qtTempShortYuv[qtLayer].getLumaAddr(absPartIdx);<br>
> X265_CHECK(m_qtTempShortYuv[qtLayer].m_size == MAX_CU_SIZE, "width not full CU\n");<br>
> @@ -2880,10 +2890,21 @@<br>
> {<br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdx], curResiY, strideResiY, coeffCurY, log2TrSize, TEXT_LUMA, false, false, numSigY); //this is for inter mode only<br>
><br>
> - const uint32_t nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, curResiY, strideResiY);<br>
> + uint32_t nonZeroDistY;<br>
> uint32_t nonZeroPsyEnergyY = 0;<br>
> if (m_rdCost.m_psyRd)<br>
> - nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, curResiY, strideResiY);<br>
> + {<br>
> + ALIGN_VAR_32(pixel, tmpRecon[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
<br>
</div></div>tmpRecon probably wants to be a per-depth Yuv, but otherwise the logic<br>
looks sound.<br>
<span><br>
> + uint32_t strideRecon = MAX_CU_SIZE;<br>
> + //===== reconstruction =====<br>
<br>
</span>I've been removing this style of comment. both for the = abuse and the<br>
not very helpful content. if it said something like "measure distortion<br>
and psy-energy with reconstructed pixels", I might be inclined to keep<br>
it<br>
<span><br>
> + fenc = const_cast<pixel*>(fencYuv->getLumaAddr(absPartIdx));<br>
> + pred = const_cast<pixel*>(predYuv->getLumaAddr(absPartIdx));<br>
> + primitives.luma_add_ps[partSize](tmpRecon, strideRecon, pred, curResiY, predYuv->m_size, strideResiY);<br>
> + nonZeroDistY = primitives.sse_pp[partSize](fenc, fencYuv->m_size, tmpRecon, strideRecon);<br>
> + nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, tmpRecon, strideRecon);<br>
> + }<br>
> + else<br>
> + nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, curResiY, strideResiY);<br>
><br>
> if (cu->m_cuTransquantBypass[0])<br>
> {<br>
> @@ -2956,19 +2977,41 @@<br>
> int16_t *curResiU = m_qtTempShortYuv[qtLayer].getCbAddr(absPartIdxC);<br>
> int16_t *curResiV = m_qtTempShortYuv[qtLayer].getCrAddr(absPartIdxC);<br>
><br>
> - distU = m_rdCost.scaleChromaDistCb(primitives.ssd_s[log2TrSizeC - 2](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize));<br>
> - if (outZeroDist)<br>
> + if(m_rdCost.m_psyRd)<br>
> + {<br>
> + fenc = const_cast<pixel*>(fencYuv->getCbAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCbAddr(absPartIdxC));<br>
> + distU = m_rdCost.scaleChromaDistCb(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize,pred, predYuv->m_csize));<br>
<br>
</span>a couple of w-s nits here<br>
<div><div><br>
> + psyEnergyU = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, pred, predYuv->m_csize);<br>
> + }<br>
> + else<br>
> + distU = m_rdCost.scaleChromaDistCb(primitives.ssd_s[partSizeC](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize));<br>
> +<br>
> + if (outZeroDist)<br>
> *outZeroDist += distU;<br>
><br>
> if (numSigU[tuIterator.section])<br>
> {<br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdxC], curResiU, strideResiC, coeffCurU + subTUOffset,<br>
> log2TrSizeC, TEXT_CHROMA_U, false, false, numSigU[tuIterator.section]);<br>
> - uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, curResiU, strideResiC);<br>
> - const uint32_t nonZeroDistU = m_rdCost.scaleChromaDistCb(dist);<br>
> + uint32_t nonZeroDistU;<br>
> uint32_t nonZeroPsyEnergyU = 0;<br>
> if (m_rdCost.m_psyRd)<br>
> - nonZeroPsyEnergyU = m_rdCost.psyCost(partSizeC, resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, curResiU, strideResiC);<br>
> + {<br>
> + ALIGN_VAR_32(pixel, tmpReconU[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
> + uint32_t strideReconC = MAX_CU_SIZE;<br>
> + //===== reconstruction =====<br>
> + fenc = const_cast<pixel*>(fencYuv->getCbAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCbAddr(absPartIdxC));<br>
> + primitives.luma_add_ps[partSizeC](tmpReconU, strideReconC, pred, curResiU, predYuv->m_csize, strideResiC);<br>
> + nonZeroDistU = m_rdCost.scaleChromaDistCb(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize, tmpReconU, strideReconC));<br>
> + nonZeroPsyEnergyU = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, tmpReconU, strideReconC);<br>
> + }<br>
> + else<br>
> + {<br>
> + uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, curResiU, strideResiC);<br>
> + nonZeroDistU = m_rdCost.scaleChromaDistCb(dist);<br>
> + }<br>
><br>
> if (cu->m_cuTransquantBypass[0])<br>
> {<br>
> @@ -3025,7 +3068,16 @@<br>
> if (!numSigU[tuIterator.section])<br>
> primitives.blockfill_s[partSizeC](curResiU, strideResiC, 0);<br>
><br>
> - distV = m_rdCost.scaleChromaDistCr(primitives.ssd_s[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize));<br>
> + if(m_rdCost.m_psyRd)<br>
> + {<br>
> + fenc = const_cast<pixel*>(fencYuv->getCrAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCrAddr(absPartIdxC));<br>
> + distV = m_rdCost.scaleChromaDistCr(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize, pred, predYuv->m_csize));<br>
> + psyEnergyV = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, pred, predYuv->m_csize);<br>
> + }<br>
> + else<br>
> + distV = m_rdCost.scaleChromaDistCr(primitives.ssd_s[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize));<br>
> +<br>
> if (outZeroDist)<br>
> *outZeroDist += distV;<br>
><br>
> @@ -3033,11 +3085,24 @@<br>
> {<br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdxC], curResiV, strideResiC, coeffCurV + subTUOffset,<br>
> log2TrSizeC, TEXT_CHROMA_V, false, false, numSigV[tuIterator.section]);<br>
> - uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, curResiV, strideResiC);<br>
> - const uint32_t nonZeroDistV = m_rdCost.scaleChromaDistCr(dist);<br>
> + uint32_t nonZeroDistV;<br>
> uint32_t nonZeroPsyEnergyV = 0;<br>
> if (m_rdCost.m_psyRd)<br>
> - nonZeroPsyEnergyV = m_rdCost.psyCost(partSizeC, resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, curResiV, strideResiC);<br>
> + {<br>
> + ALIGN_VAR_32(pixel, tmpReconV[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
> + uint32_t strideReconC = MAX_CU_SIZE;<br>
> + fenc = const_cast<pixel*>(fencYuv->getCrAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCrAddr(absPartIdxC));<br>
> + //===== reconstruction =====<br>
> + primitives.luma_add_ps[partSizeC](tmpReconV, strideReconC, pred, curResiV, predYuv->m_csize, strideResiC);<br>
> + nonZeroDistV = m_rdCost.scaleChromaDistCr(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize, tmpReconV, strideReconC));<br>
> + nonZeroPsyEnergyV = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, tmpReconV, strideReconC);<br>
> + }<br>
> + else<br>
> + {<br>
> + uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, curResiV, strideResiC);<br>
> + nonZeroDistV = m_rdCost.scaleChromaDistCr(dist);<br>
> + }<br>
><br>
> if (cu->m_cuTransquantBypass[0])<br>
> {<br>
> @@ -3130,15 +3195,23 @@<br>
><br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdx], tsResiY, trSize, tsCoeffY, log2TrSize, TEXT_LUMA, false, true, numSigTSkipY);<br>
><br>
> - nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, tsResiY, trSize);<br>
> -<br>
> if (m_rdCost.m_psyRd)<br>
> {<br>
> - nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, tsResiY, trSize);<br>
> + ALIGN_VAR_32(pixel, tmpRecon[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
> + uint32_t strideRecon = MAX_CU_SIZE;<br>
> + //===== reconstruction =====<br>
> + fenc = const_cast<pixel*>(fencYuv->getLumaAddr(absPartIdx));<br>
> + pred = const_cast<pixel*>(predYuv->getLumaAddr(absPartIdx));<br>
> + primitives.luma_add_ps[partSize](tmpRecon, strideRecon, pred, tsResiY, predYuv->m_size, trSize);<br>
> + nonZeroDistY = primitives.sse_pp[partSize](fenc, fencYuv->m_size, tmpRecon, strideRecon);<br>
> + nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, tmpRecon, strideRecon);<br>
> singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, skipSingleBitsY, nonZeroPsyEnergyY);<br>
> }<br>
> else<br>
> + {<br>
> + nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absPartIdx), resiYuv->m_size, tsResiY, trSize);<br>
> singleCostY = m_rdCost.calcRdCost(nonZeroDistY, skipSingleBitsY);<br>
> + }<br>
> }<br>
><br>
> if (!numSigTSkipY || minCost[TEXT_LUMA][0] < singleCostY)<br>
> @@ -3208,15 +3281,24 @@<br>
><br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdxC], tsResiU, trSizeC, tsCoeffU,<br>
> log2TrSizeC, TEXT_CHROMA_U, false, true, numSigTSkipU);<br>
> - uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, tsResiU, trSizeC);<br>
> - nonZeroDistU = m_rdCost.scaleChromaDistCb(dist);<br>
> if (m_rdCost.m_psyRd)<br>
> {<br>
> - nonZeroPsyEnergyU = m_rdCost.psyCost(partSizeC, resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, tsResiU, trSizeC);<br>
> + ALIGN_VAR_32(pixel, tmpReconU[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
> + uint32_t strideReconC = MAX_CU_SIZE;<br>
> + //===== reconstruction =====<br>
> + fenc = const_cast<pixel*>(fencYuv->getCbAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCbAddr(absPartIdxC));<br>
> + primitives.luma_add_ps[partSizeC](tmpReconU, strideReconC, pred, tsResiU, predYuv->m_csize, trSizeC);<br>
> + nonZeroDistU = m_rdCost.scaleChromaDistCb(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize, tmpReconU, strideReconC));<br>
> + nonZeroPsyEnergyU = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, tmpReconU, strideReconC);<br>
> singleCostU = m_rdCost.calcPsyRdCost(nonZeroDistU, singleBitsComp[TEXT_CHROMA_U][tuIterator.section], nonZeroPsyEnergyU);<br>
> }<br>
> else<br>
> + {<br>
> + uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCbAddr(absPartIdxC), resiYuv->m_csize, tsResiU, trSizeC);<br>
> + nonZeroDistU = m_rdCost.scaleChromaDistCb(dist);<br>
> singleCostU = m_rdCost.calcRdCost(nonZeroDistU, singleBitsComp[TEXT_CHROMA_U][tuIterator.section]);<br>
> + }<br>
> }<br>
><br>
> if (!numSigTSkipU || minCost[TEXT_CHROMA_U][tuIterator.section] < singleCostU)<br>
> @@ -3239,15 +3321,24 @@<br>
><br>
> m_quant.invtransformNxN(cu->m_cuTransquantBypass[absPartIdxC], tsResiV, trSizeC, tsCoeffV,<br>
> log2TrSizeC, TEXT_CHROMA_V, false, true, numSigTSkipV);<br>
> - uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, tsResiV, trSizeC);<br>
> - nonZeroDistV = m_rdCost.scaleChromaDistCr(dist);<br>
> if (m_rdCost.m_psyRd)<br>
> {<br>
> - nonZeroPsyEnergyV = m_rdCost.psyCost(partSizeC, resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, tsResiV, trSizeC);<br>
> + ALIGN_VAR_32(pixel, tmpReconV[MAX_CU_SIZE * MAX_CU_SIZE]);<br>
> + uint32_t strideReconC = MAX_CU_SIZE;<br>
> + //===== reconstruction =====<br>
> + fenc = const_cast<pixel*>(fencYuv->getCrAddr(absPartIdxC));<br>
> + pred = const_cast<pixel*>(predYuv->getCrAddr(absPartIdxC));<br>
> + primitives.luma_add_ps[partSizeC](tmpReconV, strideReconC, pred, tsResiV, predYuv->m_csize, trSizeC);<br>
> + nonZeroDistV = m_rdCost.scaleChromaDistCr(primitives.sse_pp[partSizeC](fenc, fencYuv->m_csize, tmpReconV, strideReconC));<br>
> + nonZeroPsyEnergyV = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, tmpReconV, strideReconC);<br>
> singleCostV = m_rdCost.calcPsyRdCost(nonZeroDistV, singleBitsComp[TEXT_CHROMA_V][tuIterator.section], nonZeroPsyEnergyV);<br>
> }<br>
> else<br>
> + {<br>
> + uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCrAddr(absPartIdxC), resiYuv->m_csize, tsResiV, trSizeC);<br>
> + nonZeroDistV = m_rdCost.scaleChromaDistCr(dist);<br>
> singleCostV = m_rdCost.calcRdCost(nonZeroDistV, singleBitsComp[TEXT_CHROMA_V][tuIterator.section]);<br>
> + }<br>
> }<br>
><br>
> if (!numSigTSkipV || minCost[TEXT_CHROMA_V][tuIterator.section] < singleCostV)<br>
</div></div>> _______________________________________________<br>
> x265-devel mailing list<br>
> <a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
> <a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<span><font color="#888888"><br>
--<br>
Steve Borho<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</font></span></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<br></blockquote></div><br></div>