<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 28, 2017 at 3:37 PM,  <span dir="ltr"><<a href="mailto:praveen@multicorewareinc.com" target="_blank">praveen@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"># HG changeset patch<br>
# User Praveen Tiwari <<a href="mailto:praveen@multicorewareinc.com">praveen@multicorewareinc.com</a>><br>
# Date 1511855234 -19800<br>
#      Tue Nov 28 13:17:14 2017 +0530<br>
# Node ID 85970193df47aa5da685efc27aaef0<wbr>4d9f7f21a0<br>
# Parent  d732ca2095defdbf42748327006083<wbr>befb30a89e<br>
quant.cpp: use 'nonPsyRdoQuant_c' primitive to optimize rdoQuant path<br>
<br>
diff -r d732ca2095de -r 85970193df47 source/common/quant.cpp<br>
--- a/source/common/quant.cpp   Tue Nov 28 12:10:22 2017 +0530<br>
+++ b/source/common/quant.cpp   Tue Nov 28 13:17:14 2017 +0530<br>
@@ -824,16 +824,14 @@<br>
             }<br>
             else<br>
             {<br>
-                // non-psy path<br>
+                // non-psy path - expected to work faster by FMA SIMD<br>
+                primitives.nonPsyRdoQuant(m_<wbr>resiDctCoeff, costUncoded, &totalUncodedCost, &totalRdCost, blkPos, log2TrSize);<br>
+                blkPos = codeParams.scan[scanPosBase];<br>
+<br>
                 for (int y = 0; y < MLS_CG_SIZE; y++)<br>
                 {<br>
                     for (int x = 0; x < MLS_CG_SIZE; x++)<br>
                     {<br>
-                        int signCoef = m_resiDctCoeff[blkPos + x];            /* pre-quantization DCT coeff */<br>
-                        costUncoded[blkPos + x] = static_cast<double>(((int64_t)<wbr>signCoef * signCoef) << scaleBits);<br>
-                        totalUncodedCost += costUncoded[blkPos + x];<br>
-                        totalRdCost += costUncoded[blkPos + x];<br>
-<br></blockquote><div>Is it not possible to insert the following code in primitive, so that it can be written in assembly? Is there any change in performance measured using vtune after converting int64 to double?</div><div>So I suggest this patch series should go to repo along with assembly optimized code after performance measurement,  </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                         const uint32_t scanPosOffset =  y * MLS_CG_SIZE + x;<br>
                         const uint32_t ctxSig = table_cnt[patternSigCtx][g_<wbr>scan4x4[codeParams.scanType][<wbr>scanPosOffset]] + ctxSigOffset;<br>
                         X265_CHECK(trSize > 4, "trSize check failure\n");<br>
______________________________<wbr>_________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/<wbr>listinfo/x265-devel</a><br>
</blockquote></div><br></div></div>