<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 20, 2020 at 3:32 PM Kavitha Sampath <<a href="mailto:kavitha@multicorewareinc.com">kavitha@multicorewareinc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 17, 2020 at 8:22 AM Mahesh Pittala <<a href="mailto:mahesh@multicorewareinc.com" target="_blank">mahesh@multicorewareinc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">From 787ae5da7431b5d113ea033cf6502ac1cc1e7572 Mon Sep 17 00:00:00 2001<br>From: maheshpittala <<a href="mailto:mahesh@multicorewareinc.com" target="_blank">mahesh@multicorewareinc.com</a>><br>Date: Sun, 1 Nov 2020 10:09:28 +0530<br>Subject: [PATCH] correct reusing cutree qp offsets in load encode for<br> reuse-level > 1 and < 10 for same resolution<br><br>Earlier in save encode, dumped only best modes analysis data of that CTU into file after encoding, not for each split CU's analysis. So in analysis load, it reads the same best mode's qp value even for split CU's(whereas split CU's qp would be different in save encode) and redo-analysis.<br><br>So now, cuGeom.geomRecurId stores unique ID for each CU and even for parents CU so based on this storing cutree qp offset and loaded same<br></div></blockquote><div>[KS] Commit message sounds informal. Suggest rephrasing </div></div></div></blockquote><div>    [SK] Addressed the same. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">---<br> source/abrEncApp.cpp         |  6 +++<br> source/common/cudata.cpp     |  6 ++-<br> source/common/cudata.h       |  3 +-<br> source/encoder/analysis.cpp  | 32 ++++++++++--<br> source/encoder/api.cpp       | 12 +++++<br> source/encoder/encoder.cpp   | 97 ++++++++++++++++++++++++++++++++----<br> source/encoder/slicetype.cpp |  2 +-<br> source/x265.h                |  2 +<br> 8 files changed, 140 insertions(+), 20 deletions(-)<br><br>diff --git a/source/abrEncApp.cpp b/source/abrEncApp.cpp<br>index cd85154f1..3550d8b11 100644<br>--- a/source/abrEncApp.cpp<br>+++ b/source/abrEncApp.cpp<br>@@ -342,7 +342,10 @@ namespace X265_NS {<br>             memcpy(intraDst->partSizes, intraSrc->partSizes, sizeof(char) * src->depthBytes);<br>             memcpy(intraDst->chromaModes, intraSrc->chromaModes, sizeof(uint8_t) * src->depthBytes);<br>             if (m_param->rc.cuTree)<br>+            {<br>                 memcpy(intraDst->cuQPOff, intraSrc->cuQPOff, sizeof(int8_t) * src->depthBytes);<br>+                memcpy(intraDst->cuQPOffReuse, intraSrc->cuQPOffReuse, sizeof(int8_t) * (src->numCUsInFrame * src->numPartitions));<br></div></blockquote><div>[KS] maximum number of qp's saved per CTU is 85. Allocating copying numPartition size is unnecessary </div></div></div></blockquote><div>    [SK] Agreed. Fixed the same.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">+            }<br>         }<br>         else<br>         {<br>@@ -357,7 +360,10 @@ namespace X265_NS {<br>             memcpy(interDst->depth, interSrc->depth, sizeof(uint8_t) * src->depthBytes);<br>             memcpy(interDst->modes, interSrc->modes, sizeof(uint8_t) * src->depthBytes);<br>             if (m_param->rc.cuTree)<br>+            {<br>                 memcpy(interDst->cuQPOff, interSrc->cuQPOff, sizeof(int8_t) * src->depthBytes);<br>+                memcpy(interDst->cuQPOffReuse, interSrc->cuQPOffReuse, sizeof(int8_t) * (src->numCUsInFrame * src->numPartitions));<br>+            }<br>             if (m_param->analysisSaveReuseLevel > 4)<br>             {<br>                 memcpy(interDst->partSize, interSrc->partSize, sizeof(uint8_t) * src->depthBytes);<br>diff --git a/source/common/cudata.cpp b/source/common/cudata.cpp<br>index 19281dee2..08cdff11a 100644<br>--- a/source/common/cudata.cpp<br>+++ b/source/common/cudata.cpp<br>@@ -194,6 +194,7 @@ void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x26<br> <br>         m_qp        = (int8_t*)charBuf; charBuf += m_numPartitions;<br>         m_qpAnalysis = (int8_t*)charBuf; charBuf += m_numPartitions;<br>+        m_qpreuse    = (int8_t*)charBuf; charBuf += m_numPartitions;<br></div></blockquote><div>[KS] Can you move this out of parentCTU? Would be appropriate to include it as an Analysis class member - just like other reuse parameters such as m_reuseRef, m_reuseDepth,..</div></div></div></blockquote><div>   [SK] addressed the same so that cudata mem pool can be used for other purposes. We will store the offsets only in the frame's analysis data structures. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">         m_log2CUSize         = charBuf; charBuf += m_numPartitions;<br>         m_lumaIntraDir       = charBuf; charBuf += m_numPartitions;<br>         m_tqBypass           = charBuf; charBuf += m_numPartitions;<br>@@ -235,6 +236,7 @@ void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x26<br> <br>         m_qp        = (int8_t*)charBuf; charBuf += m_numPartitions;<br>         m_qpAnalysis = (int8_t*)charBuf; charBuf += m_numPartitions;<br>+        m_qpreuse =    (int8_t*)charBuf; charBuf += m_numPartitions;<br>         m_log2CUSize         = charBuf; charBuf += m_numPartitions;<br>         m_lumaIntraDir       = charBuf; charBuf += m_numPartitions;<br>         m_tqBypass           = charBuf; charBuf += m_numPartitions;<br>@@ -307,7 +309,7 @@ void CUData::initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t first<br>     X265_CHECK(!(frame.m_encData->m_param->bLossless && !m_slice->m_pps->bTransquantBypassEnabled), "lossless enabled without TQbypass in PPS\n");<br> <br>     /* initialize the remaining CU data in one memset */<br>-    memset(m_cuDepth, 0, (frame.m_param->internalCsp == X265_CSP_I400 ? BytesPerPartition - 12 : BytesPerPartition - 8) * m_numPartitions);<br>+    memset(m_cuDepth, 0, (frame.m_param->internalCsp == X265_CSP_I400 ? BytesPerPartition - 13 : BytesPerPartition - 9) * m_numPartitions);<br> <br>     for (int8_t i = 0; i < NUM_TU_DEPTH; i++)<br>         m_refTuDepth[i] = -1;<br>@@ -358,7 +360,7 @@ void CUData::initSubCU(const CUData& ctu, const CUGeom& cuGeom, int qp)<br>     m_partSet(m_cuDepth,      (uint8_t)cuGeom.depth);<br> <br>     /* initialize the remaining CU data in one memset */<br>-    memset(m_predMode, 0, (ctu.m_chromaFormat == X265_CSP_I400 ? BytesPerPartition - 13 : BytesPerPartition - 9) * m_numPartitions);<br>+    memset(m_predMode, 0, (ctu.m_chromaFormat == X265_CSP_I400 ? BytesPerPartition - 14 : BytesPerPartition - 10) * m_numPartitions);<br>     memset(m_distortion, 0, m_numPartitions * sizeof(sse_t));<br> }<br> <br>diff --git a/source/common/cudata.h b/source/common/cudata.h<br>index 8397f0568..d58f53e39 100644<br>--- a/source/common/cudata.h<br>+++ b/source/common/cudata.h<br>@@ -192,6 +192,7 @@ public:<br>     /* Per-part data, stored contiguously */<br>     int8_t*       m_qp;               // array of QP values<br>     int8_t*       m_qpAnalysis;       // array of QP values for analysis reuse<br>+    int8_t*       m_qpreuse;          // array of QP values for analysis reuse for reuse levels > 1 and < 10<br>     uint8_t*      m_log2CUSize;       // array of cu log2Size TODO: seems redundant to depth<br>     uint8_t*      m_lumaIntraDir;     // array of intra directions (luma)<br>     uint8_t*      m_tqBypass;         // array of CU lossless flags<br>@@ -207,7 +208,7 @@ public:<br>     uint8_t*      m_transformSkip[3]; // array of transform skipping flags per plane<br>     uint8_t*      m_cbf[3];           // array of coded block flags (CBF) per plane<br>     uint8_t*      m_chromaIntraDir;   // array of intra directions (chroma)<br>-    enum { BytesPerPartition = 24 };  // combined sizeof() of all per-part data<br>+    enum { BytesPerPartition = 25 };  // combined sizeof() of all per-part data<br> <br>     sse_t*        m_distortion;<br>     coeff_t*      m_trCoeff[3];       // transformed coefficient buffer per plane<br>diff --git a/source/encoder/analysis.cpp b/source/encoder/analysis.cpp<br>index aabf386ca..b1d7e3ad1 100644<br>--- a/source/encoder/analysis.cpp<br>+++ b/source/encoder/analysis.cpp<br>@@ -520,6 +520,9 @@ uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom<br>     bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);<br>     bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);<br> <br>+    if (m_param->rc.cuTree)<br>+        parentCTU.m_qpreuse[cuGeom.geomRecurId] = (int8_t)qp;<br>+<br>     bool bAlreadyDecided = m_param->intraRefine != 4 && parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] != (uint8_t)ALL_IDX && !(m_param->bAnalysisType == HEVC_INFO);<br>     bool bDecidedDepth = m_param->intraRefine != 4 && parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;<br>     int split = 0;<br>@@ -870,6 +873,9 @@ uint32_t Analysis::compressInterCU_dist(const CUData& parentCTU, const CUGeom& c<br>     uint32_t minDepth = m_param->rdLevel <= 4 ? topSkipMinDepth(parentCTU, cuGeom) : 0;<br>     uint32_t splitRefs[4] = { 0, 0, 0, 0 };<br> <br>+    if (m_param->rc.cuTree)<br>+        parentCTU.m_qpreuse[cuGeom.geomRecurId] = (int8_t)qp;<br>+<br>     X265_CHECK(m_param->rdLevel >= 2, "compressInterCU_dist does not support RD 0 or 1\n");<br> <br>     PMODE pmode(*this, cuGeom);<br>@@ -1152,6 +1158,8 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&<br>     uint32_t cuAddr = parentCTU.m_cuAddr;<br>     ModeDepth& md = m_modeDepth[depth];<br> <br>+    if (m_param->rc.cuTree)<br>+        parentCTU.m_qpreuse[cuGeom.geomRecurId] = (int8_t)qp;<br> <br>     if (m_param->searchMethod == X265_SEA)<br>     {<br>@@ -1856,6 +1864,9 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&<br>     ModeDepth& md = m_modeDepth[depth];<br>     md.bestMode = NULL;<br> <br>+    if (m_param->rc.cuTree)<br>+        parentCTU.m_qpreuse[cuGeom.geomRecurId] = (int8_t)qp;<br>+<br>     if (m_param->searchMethod == X265_SEA)<br>     {<br>         int numPredDir = m_slice->isInterP() ? 1 : 2;<br>@@ -3643,15 +3654,26 @@ int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int3<br>         if ((distortionData->threshold[ctu.m_cuAddr] < 0.9 || distortionData->threshold[ctu.m_cuAddr] > 1.1)<br>             && distortionData->highDistortionCtuCount && distortionData->lowDistortionCtuCount)<br>             qp += distortionData->offset[ctu.m_cuAddr];<br>-    }<br>+       }<br> <br>     if (m_param->analysisLoadReuseLevel >= 2 && m_param->rc.cuTree)<br>     {<br>-        int cuIdx = (ctu.m_cuAddr * ctu.m_numPartitions) + cuGeom.absPartIdx;<br>-        if (ctu.m_slice->m_sliceType == I_SLICE)<br>-            return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_intra_data*)m_frame->m_analysisData.intraData)->cuQPOff[cuIdx]));<br>+        if (m_param->scaleFactor == 2 || m_param->analysisLoadReuseLevel == 10)<br>+        {<br>+            int cuIdx = (ctu.m_cuAddr * ctu.m_numPartitions) + cuGeom.absPartIdx;<br>+            if (ctu.m_slice->m_sliceType == I_SLICE)<br>+                return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_intra_data*)m_frame->m_analysisData.intraData)->cuQPOff[cuIdx]));<br>+            else<br>+                return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_inter_data*)m_frame->m_analysisData.interData)->cuQPOff[cuIdx]));<br>+        }<br>         else<br>-            return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_inter_data*)m_frame->m_analysisData.interData)->cuQPOff[cuIdx]));<br>+        {<br>+            int cuIdx = (ctu.m_cuAddr * ctu.m_numPartitions) + cuGeom.geomRecurId;<br>+            if (ctu.m_slice->m_sliceType == I_SLICE)<br>+                return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_intra_data*)m_frame->m_analysisData.intraData)->cuQPOffReuse[cuIdx]));<br>+            else<br>+                return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int32_t)(qp + 0.5 + ((x265_analysis_inter_data*)m_frame->m_analysisData.interData)->cuQPOffReuse[cuIdx]));<br>+        }<br></div></blockquote><div>[KS] Why is this reuse not applicable to reuse level 1? </div></div></div></blockquote><div>   [SK] Not sure of the improvements or gain in this case. Since this is a general question , we  will be tracking this and other improvements possible for multipass encoding as a separate action item under</div><div>  x265-Story - 1059. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">     }<br>     if (m_param->rc.hevcAq)<br>     {<br>diff --git a/source/encoder/api.cpp b/source/encoder/api.cpp<br>index a986355e0..0f266d328 100644<br>--- a/source/encoder/api.cpp<br>+++ b/source/encoder/api.cpp<br>@@ -825,7 +825,10 @@ void x265_alloc_analysis_data(x265_param *param, x265_analysis_data* analysis)<br>         CHECKED_MALLOC_ZERO(intraData->partSizes, char, analysis->numPartitions * analysis->numCUsInFrame);<br>         CHECKED_MALLOC_ZERO(intraData->chromaModes, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>         if (param->rc.cuTree)<br>+        {<br>             CHECKED_MALLOC_ZERO(intraData->cuQPOff, int8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>+            CHECKED_MALLOC_ZERO(intraData->cuQPOffReuse, int8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>+        }<br>     }<br>     analysis->intraData = intraData;<br> <br>@@ -837,7 +840,10 @@ void x265_alloc_analysis_data(x265_param *param, x265_analysis_data* analysis)<br>         CHECKED_MALLOC_ZERO(interData->modes, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);<br> <br>         if (param->rc.cuTree && !isMultiPassOpt)<br>+        {<br>             CHECKED_MALLOC_ZERO(interData->cuQPOff, int8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>+            CHECKED_MALLOC_ZERO(interData->cuQPOffReuse, int8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>+        }<br>         CHECKED_MALLOC_ZERO(interData->mvpIdx[0], uint8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>         CHECKED_MALLOC_ZERO(interData->mvpIdx[1], uint8_t, analysis->numPartitions * analysis->numCUsInFrame);<br>         CHECKED_MALLOC_ZERO(interData->mv[0], x265_analysis_MV, analysis->numPartitions * analysis->numCUsInFrame);<br>@@ -919,7 +925,10 @@ void x265_free_analysis_data(x265_param *param, x265_analysis_data* analysis)<br>             X265_FREE((analysis->intraData)->partSizes);<br>             X265_FREE((analysis->intraData)->chromaModes);<br>             if (param->rc.cuTree)<br>+            {<br>                 X265_FREE((analysis->intraData)->cuQPOff);<br>+                X265_FREE((analysis->intraData)->cuQPOffReuse);<br>+            }<br>         }<br>         X265_FREE(analysis->intraData);<br>         analysis->intraData = NULL;<br>@@ -931,7 +940,10 @@ void x265_free_analysis_data(x265_param *param, x265_analysis_data* analysis)<br>         X265_FREE((analysis->interData)->depth);<br>         X265_FREE((analysis->interData)->modes);<br>         if (!isMultiPassOpt && param->rc.cuTree)<br>+        {<br>             X265_FREE((analysis->interData)->cuQPOff);<br>+            X265_FREE((analysis->interData)->cuQPOffReuse);<br>+        }<br>         X265_FREE((analysis->interData)->mvpIdx[0]);<br>         X265_FREE((analysis->interData)->mvpIdx[1]);<br>         X265_FREE((analysis->interData)->mv[0]);<br>diff --git a/source/encoder/encoder.cpp b/source/encoder/encoder.cpp<br>index 1f710e1ce..9666744f3 100644<br>--- a/source/encoder/encoder.cpp<br>+++ b/source/encoder/encoder.cpp<br>@@ -4452,19 +4452,25 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>             return;<br> <br>         uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSizes = NULL;<br>-        int8_t *cuQPBuf = NULL;<br>+        int8_t *cuQPBuf = NULL, *cuQPReuseBuf = NULL;<br> <br>         tempBuf = X265_MALLOC(uint8_t, depthBytes * 3);<br>         depthBuf = tempBuf;<br>         modeBuf = tempBuf + depthBytes;<br>         partSizes = tempBuf + 2 * depthBytes;<br>         if (m_param->rc.cuTree)<br>+        {<br>             cuQPBuf = X265_MALLOC(int8_t, depthBytes);<br>+                        cuQPReuseBuf = X265_MALLOC(int8_t, scaledNumPartition * analysis->numCUsInFrame);<br></div></blockquote><div>[KS] Check whitespaces  </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">+        }<br> <br>         X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->depth);<br>         X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->chromaModes);<br>         X265_FREAD(partSizes, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->partSizes);<br>-        if (m_param->rc.cuTree) { X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, intraPic->cuQPOff); }<br>+        if (m_param->rc.cuTree) {<br>+            X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, intraPic->cuQPOff);<br>+            X265_FREAD(cuQPReuseBuf, sizeof(int8_t), (scaledNumPartition * analysis->numCUsInFrame), m_analysisFileIn, intraPic->cuQPOffReuse);<br>+        }<br> <br>         size_t count = 0;<br>         for (uint32_t d = 0; d < depthBytes; d++)<br>@@ -4484,7 +4490,11 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>                 memset(&(analysis->intraData)->cuQPOff[count], cuQPBuf[d], bytes);<br>             count += bytes;<br>         }<br>-<br>+        if (m_param->rc.cuTree)<br>+        {<br>+                 for (uint32_t i = 0; i < (scaledNumPartition * analysis->numCUsInFrame); i++)<br>+                memset(&(analysis->intraData)->cuQPOffReuse[i], cuQPReuseBuf[i], sizeof(int8_t));<br>+        }<br>         if (!m_param->scaleFactor)<br>         {<br>             X265_FREAD((analysis->intraData)->modes, sizeof(uint8_t), numCUsLoad * analysis->numPartitions, m_analysisFileIn, intraPic->modes);<br>@@ -4498,7 +4508,10 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>             X265_FREE(tempLumaBuf);<br>         }<br>         if (m_param->rc.cuTree)<br>+        {<br>             X265_FREE(cuQPBuf);<br>+            X265_FREE(cuQPReuseBuf);<br>+        }<br>         X265_FREE(tempBuf);<br>         consumedBytes += frameRecordSize;<br>     }<br>@@ -4515,7 +4528,7 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>         uint8_t *interDir = NULL, *chromaDir = NULL, *mvpIdx[2];<br>         MV* mv[2];<br>         int8_t* refIdx[2];<br>-        int8_t* cuQPBuf = NULL;<br>+        int8_t* cuQPBuf = NULL, *cuQPReuseBuf = NULL;<br></div></blockquote><div>[KS] Why can't we reuse cuQPBuf ? I agree that the size of offsets differ for reuse level 10 and others but that can be taken care of in allocation. </div></div></div></blockquote><div>   [SK] We can use the same and also use the same buffer in analysis data for all reuse levels. Hence optimized the memory footprint per frame. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"> <br>         int numBuf = m_param->analysisLoadReuseLevel > 4 ? 4 : 2;<br>         bool bIntraInInter = false;<br>@@ -4536,11 +4549,17 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>             depthBuf = tempBuf;<br>             modeBuf = tempBuf + depthBytes;<br>             if (m_param->rc.cuTree)<br>+            {<br>                 cuQPBuf = X265_MALLOC(int8_t, depthBytes);<br>+                                cuQPReuseBuf = X265_MALLOC(int8_t, scaledNumPartition * analysis->numCUsInFrame);<br>+            }<br> <br>             X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, interPic->depth);<br>             X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, interPic->modes);<br>-            if (m_param->rc.cuTree) { X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, interPic->cuQPOff); }<br>+            if (m_param->rc.cuTree) {<br>+                X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, interPic->cuQPOff);<br>+                X265_FREAD(cuQPReuseBuf, sizeof(int8_t), (scaledNumPartition * analysis->numCUsInFrame), m_analysisFileIn, interPic->cuQPOffReuse);<br>+            }<br> <br>             if (m_param->analysisLoadReuseLevel > 4)<br>             {<br>@@ -4611,9 +4630,17 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>                 }<br>                 count += bytes;<br>             }<br>+            if (m_param->rc.cuTree)<br>+            {<br>+                              for (uint32_t i = 0; i < (scaledNumPartition * analysis->numCUsInFrame); i++)<br>+                    memset(&(analysis->interData)->cuQPOffReuse[i], cuQPReuseBuf[i], sizeof(int8_t));<br>+            }<br> <br>             if (m_param->rc.cuTree)<br>+            {<br>                 X265_FREE(cuQPBuf);<br>+                X265_FREE(cuQPReuseBuf);<br>+            }<br>             X265_FREE(tempBuf);<br>         }<br>         if (m_param->analysisLoadReuseLevel == 10)<br>@@ -4814,19 +4841,26 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>             return;<br> <br>         uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSizes = NULL;<br>-        int8_t *cuQPBuf = NULL;<br>+        int8_t *cuQPBuf = NULL, *cuQPReuseBuf = NULL;;<br> <br>         tempBuf = X265_MALLOC(uint8_t, depthBytes * 3);<br>         depthBuf = tempBuf;<br>         modeBuf = tempBuf + depthBytes;<br>         partSizes = tempBuf + 2 * depthBytes;<br>         if (m_param->rc.cuTree)<br>+        {<br>             cuQPBuf = X265_MALLOC(int8_t, depthBytes);<br>+            cuQPReuseBuf = X265_MALLOC(int8_t, (analysis->numPartitions / factor) * analysis->numCUsInFrame);<br>+        }<br> <br>         X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->depth);<br>         X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->chromaModes);<br>         X265_FREAD(partSizes, sizeof(uint8_t), depthBytes, m_analysisFileIn, intraPic->partSizes);<br>-        if (m_param->rc.cuTree) { X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, intraPic->cuQPOff); }<br>+        if (m_param->rc.cuTree)<br>+        {<br>+            X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, intraPic->cuQPOff);<br>+            X265_FREAD(cuQPReuseBuf, sizeof(int8_t), ((analysis->numPartitions / factor) * analysis->numCUsInFrame), m_analysisFileIn, intraPic->cuQPOffReuse);<br>+        }<br> <br>         uint32_t count = 0;<br>         for (uint32_t d = 0; d < depthBytes; d++)<br>@@ -4869,7 +4903,10 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>         }<br>         X265_FREE(tempLumaBuf);<br>         if (m_param->rc.cuTree)<br>+        {<br>             X265_FREE(cuQPBuf);<br>+            X265_FREE(cuQPReuseBuf);<br>+        }<br>         X265_FREE(tempBuf);<br>         consumedBytes += frameRecordSize;<br>     }<br>@@ -4886,7 +4923,7 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>         uint8_t *interDir = NULL, *chromaDir = NULL, *mvpIdx[2];<br>         MV* mv[2];<br>         int8_t* refIdx[2];<br>-        int8_t* cuQPBuf = NULL;<br>+        int8_t* cuQPBuf = NULL, *cuQPReuseBuf = NULL;<br> <br>         int numBuf = m_param->analysisLoadReuseLevel > 4 ? 4 : 2;<br>         bool bIntraInInter = false;<br>@@ -4901,11 +4938,18 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>         depthBuf = tempBuf;<br>         modeBuf = tempBuf + depthBytes;<br>         if (m_param->rc.cuTree)<br>+        {<br>             cuQPBuf = X265_MALLOC(int8_t, depthBytes);<br>+            cuQPReuseBuf = X265_MALLOC(int8_t, (analysis->numPartitions / factor) * analysis->numCUsInFrame);<br>+        }<br> <br>         X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, interPic->depth);<br>         X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn, interPic->modes);<br>-        if (m_param->rc.cuTree) { X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, interPic->cuQPOff); }<br>+        if (m_param->rc.cuTree)<br>+        {<br>+            X265_FREAD(cuQPBuf, sizeof(int8_t), depthBytes, m_analysisFileIn, interPic->cuQPOff);<br>+            X265_FREAD(cuQPReuseBuf, sizeof(int8_t), (analysis->numPartitions / factor) * analysis->numCUsInFrame, m_analysisFileIn, interPic->cuQPOffReuse);<br>+        }<br>         if (m_param->analysisLoadReuseLevel > 4)<br>         {<br>             partSize = modeBuf + depthBytes;<br>@@ -5017,7 +5061,16 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x<br>         }<br> <br>         if (m_param->rc.cuTree)<br>+        {<br>+            for (uint32_t i = 0; i < ((analysis->numPartitions / factor) * analysis->numCUsInFrame); i++)<br>+                memset(&(analysis->interData)->cuQPOffReuse[i], cuQPReuseBuf[i], sizeof(int8_t));<br>+        }<br>+<br>+        if (m_param->rc.cuTree)<br>+        {<br>             X265_FREE(cuQPBuf);<br>+            X265_FREE(cuQPReuseBuf);<br>+        }<br>         X265_FREE(tempBuf);<br> <br>         if (m_param->analysisLoadReuseLevel == 10)<br>@@ -5540,6 +5593,12 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD<br>                         intraDataCTU->cuQPOff[depthBytes] = (int8_t)(ctu->m_qpAnalysis[absPartIdx] - baseQP);<br>                     absPartIdx += ctu->m_numPartitions >> (depth * 2);<br>                 }<br>+<br>+                if (m_param->rc.cuTree)<br>+                {<br>+                    for (uint32_t i = (cuAddr * ctu->m_numPartitions), j = 0; j < ctu->m_numPartitions; i++, j++)<br>+                        intraDataCTU->cuQPOffReuse[i] = (int8_t)(ctu->m_qpreuse[j] - baseQP);<br>+                }<br>                 memcpy(&intraDataCTU->modes[ctu->m_cuAddr * ctu->m_numPartitions], ctu->m_lumaIntraDir, sizeof(uint8_t)* ctu->m_numPartitions);<br>             }<br>         }<br>@@ -5599,13 +5658,20 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD<br>                     }<br>                     absPartIdx += ctu->m_numPartitions >> (depth * 2);<br>                 }<br>+<br>+                if (m_param->rc.cuTree)<br>+                {<br>+                    for (uint32_t i = (cuAddr * ctu->m_numPartitions), j = 0; j < ctu->m_numPartitions; i++, j++)<br>+                        interDataCTU->cuQPOffReuse[i] = (int8_t)(ctu->m_qpreuse[j] - baseQP);<br>+                }<br>+<br>                 if (m_param->analysisSaveReuseLevel == 10 && bIntraInInter)<br>                     memcpy(&intraDataCTU->modes[ctu->m_cuAddr * ctu->m_numPartitions], ctu->m_lumaIntraDir, sizeof(uint8_t)* ctu->m_numPartitions);<br>             }<br>         }<br> <br>         if ((analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I) && m_param->rc.cuTree)<br>-            analysis->frameRecordSize += sizeof(uint8_t)* analysis->numCUsInFrame * analysis->numPartitions + depthBytes * 3 + (sizeof(int8_t) * depthBytes);<br>+            analysis->frameRecordSize += sizeof(uint8_t)* analysis->numCUsInFrame * analysis->numPartitions + depthBytes * 3 + (sizeof(int8_t) * depthBytes) + (sizeof(int8_t) * analysis->numPartitions  * analysis->numCUsInFrame);<br>         else if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)<br>             analysis->frameRecordSize += sizeof(uint8_t)* analysis->numCUsInFrame * analysis->numPartitions + depthBytes * 3;<br>         else<br>@@ -5613,7 +5679,10 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD<br>             /* Add sizeof depth, modes, partSize, cuQPOffset, mergeFlag */<br>             analysis->frameRecordSize += depthBytes * 2;<br>             if (m_param->rc.cuTree)<br>-            analysis->frameRecordSize += (sizeof(int8_t) * depthBytes);<br>+            {<br>+                analysis->frameRecordSize += (sizeof(int8_t) * depthBytes);<br>+                analysis->frameRecordSize += (sizeof(int8_t) * analysis->numPartitions * analysis->numCUsInFrame);<br>+            }<br>             if (m_param->analysisSaveReuseLevel > 4)<br>                 analysis->frameRecordSize += (depthBytes * 2);<br> <br>@@ -5669,7 +5738,10 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD<br>         X265_FWRITE((analysis->intraData)->chromaModes, sizeof(uint8_t), depthBytes, m_analysisFileOut);<br>         X265_FWRITE((analysis->intraData)->partSizes, sizeof(char), depthBytes, m_analysisFileOut);<br>         if (m_param->rc.cuTree)<br>+        {<br>             X265_FWRITE((analysis->intraData)->cuQPOff, sizeof(int8_t), depthBytes, m_analysisFileOut);<br>+            X265_FWRITE((analysis->intraData)->cuQPOffReuse, sizeof(int8_t), (analysis->numCUsInFrame * analysis->numPartitions), m_analysisFileOut);<br>+        }<br>         X265_FWRITE((analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFileOut);<br>     }<br>     else<br>@@ -5677,7 +5749,10 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD<br>         X265_FWRITE((analysis->interData)->depth, sizeof(uint8_t), depthBytes, m_analysisFileOut);<br>         X265_FWRITE((analysis->interData)->modes, sizeof(uint8_t), depthBytes, m_analysisFileOut);<br>         if (m_param->rc.cuTree)<br>+        {<br>             X265_FWRITE((analysis->interData)->cuQPOff, sizeof(int8_t), depthBytes, m_analysisFileOut);<br>+            X265_FWRITE((analysis->interData)->cuQPOffReuse, sizeof(int8_t), (analysis->numCUsInFrame * analysis->numPartitions), m_analysisFileOut);<br>+        }<br>         if (m_param->analysisSaveReuseLevel > 4)<br>         {<br>             X265_FWRITE((analysis->interData)->partSize, sizeof(uint8_t), depthBytes, m_analysisFileOut);<br>diff --git a/source/encoder/slicetype.cpp b/source/encoder/slicetype.cpp<br>index 0adb0d0db..3bc01268b 100644<br>--- a/source/encoder/slicetype.cpp<br>+++ b/source/encoder/slicetype.cpp<br>@@ -1894,7 +1894,7 @@ void Lookahead::slicetypeAnalyse(Lowres **frames, bool bKeyframe)<br> <br>     if (!framecnt)<br>     {<br>-        if (m_param->rc.cuTree)<br>+        if (m_param->rc.cuTree && !m_param->analysisLoad)<br>             cuTree(frames, 0, bKeyframe);<br>         return;<br>     }<br>diff --git a/source/x265.h b/source/x265.h<br>index f44040ba7..d6a828539 100644<br>--- a/source/x265.h<br>+++ b/source/x265.h<br>@@ -145,6 +145,7 @@ typedef struct x265_analysis_intra_data<br>     char*     partSizes;<br>     uint8_t*  chromaModes;<br>     int8_t*    cuQPOff;<br>+    int8_t*   cuQPOffReuse;<br> }x265_analysis_intra_data;<br> <br> typedef struct x265_analysis_MV<br>@@ -170,6 +171,7 @@ typedef struct x265_analysis_inter_data<br>     x265_analysis_MV*         mv[2];<br>     int64_t*     sadCost;<br>     int8_t*    cuQPOff;<br>+    int8_t*    cuQPOffReuse;<br> }x265_analysis_inter_data;<br> <br> typedef struct x265_weight_param<br>-- <br>2.23.0.windows.1<br><br></div>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(0,0,0)">Regards,<br>Kavitha</span></div></div></div></div></div>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><b style="background-color:rgb(255,255,255)"><font color="#0b5394">With Regards,</font></b><div><b style="background-color:rgb(255,255,255)"><font color="#0b5394">Srikanth Kurapati.</font></b></div></div></div></div>