<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 15, 2014 at 4:10 PM, Steve Borho <span dir="ltr"><<a href="mailto:steve@borho.org" target="_blank">steve@borho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 09/15, <a href="mailto:gopu@multicorewareinc.com">gopu@multicorewareinc.com</a> wrote:<br>

> # HG changeset patch<br>

> # User Gopu Govindaswamy <<a href="mailto:gopu@multicorewareinc.com">gopu@multicorewareinc.com</a>><br>

> # Date 1410770251 -19800<br>

> #      Mon Sep 15 14:07:31 2014 +0530<br>

> # Node ID 9db768fa41ad927c66c1dc4ae446953862052ce4<br>

> # Parent  184e56afa951815f4e295b4fcce094ee03361a2e<br>

> analysis: Intra picture estimation information sharing<br>

><br>

> when --analysis-mode=save - the encoder runs a full encode and dump the<br>

> best split and mode decisions into x265_analysis.dat(default file name if file<br>

> name is not provided) file<br>

> when --analysis-mode=load - the encoder reads the best split and mode decisions<br>

> from x265_analysis.dat and bypass the actual split and mode decisions, and<br>

> therefore perform a much faster encode<br>

><br>

> diff -r 184e56afa951 -r 9db768fa41ad source/Lib/TLibCommon/TComRom.cpp<br>

> --- a/source/Lib/TLibCommon/TComRom.cpp       Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/Lib/TLibCommon/TComRom.cpp       Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -505,5 +505,19 @@<br>

>      0x38,<br>

>  };<br>

><br>

> +    /* Contains how much to increment shared depth buffer for different ctu sizes to get next best depth.<br>

> +     * here,<br>

> +     * depth 0 = 64x64, depth 1 = 32x32, depth 2 = 16x16 and depth 3 = 8x8<br>

> +     * if ctu = 64, depth buffer size is 256 combination of depth values 0, 1, 2, 3.<br>

> +     * if ctu = 32, depth buffer size is 64 combination of depth values 1, 2, 3.<br>

> +     * if ctu = 16, depth buffer size is 16 combination of depth values 2, 3 */<br>

<br>

</div></div>the comment should be w/s aligned with the array, lines 2&3 should be<br>

combined<br></blockquote><div><br></div><div>ok i will change this  </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><div class="h5"><br>

> +const uint32_t g_depthInc[3][4] =<br>

> +{<br>

> +    { 16,  4,  0, 0},<br>

> +    { 64, 16,  4, 1},<br>

> +    {256, 64, 16, 4}<br>

> +};<br>

> +<br>

>  }<br>

>  //! \}<br>

> diff -r 184e56afa951 -r 9db768fa41ad source/Lib/TLibCommon/TComRom.h<br>

> --- a/source/Lib/TLibCommon/TComRom.h Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/Lib/TLibCommon/TComRom.h Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -155,6 +155,8 @@<br>

>  // Intra tables<br>

>  extern const uint8_t g_intraFilterFlags[35];<br>

><br>

> +extern const uint32_t g_depthInc[3][4];<br>

> +<br>

>  }<br>

><br>

>  #endif  //ifndef X265_TCOMROM_H<br>

> diff -r 184e56afa951 -r 9db768fa41ad source/encoder/analysis.cpp<br>

> --- a/source/encoder/analysis.cpp     Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/encoder/analysis.cpp     Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -311,14 +311,24 @@<br>

>      uint32_t numPartition = cu->getTotalNumPart();<br>

>      if (m_bestCU[0]->m_slice->m_sliceType == I_SLICE)<br>

>      {<br>

> -        compressIntraCU(m_bestCU[0], m_tempCU[0], false, cu, cu->m_CULocalData);<br>

> -        if (m_param->analysisMode == 1)<br>

> +        if (m_param->analysisMode == 2)<br>

<br>

</div></div>our code should always use the X265_ANALYSIS_LOAD|SAVE macros,<br>

except when checking != 0.<br>

<span class=""><br>

>          {<br>

> -            memcpy(&m_bestCU[0]->m_pic->m_intraData->depth[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getDepth(), sizeof(uint8_t) * cu->getTotalNumPart());<br>

> -            memcpy(&m_bestCU[0]->m_pic->m_intraData->modes[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getLumaIntraDir(), sizeof(uint8_t) * cu->getTotalNumPart());<br>

> -            memcpy(&m_bestCU[0]->m_pic->m_intraData->partSizes[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getPartitionSize(), sizeof(char) * cu->getTotalNumPart());<br>

> -            m_bestCU[0]->m_pic->m_intraData->cuAddr[cu->getAddr()] = cu->getAddr();<br>

> -            m_bestCU[0]->m_pic->m_intraData->poc[cu->getAddr()]    = cu->m_pic->m_POC;<br>

> +            sharedCompressIntraCU(m_bestCU[0], m_tempCU[0], false, cu, cu->m_CULocalData,<br>

> +                &m_bestCU[0]->m_pic->m_intraData->depth[cu->getAddr() * cu->m_numPartitions],<br>

> +                &m_bestCU[0]->m_pic->m_intraData->partSizes[cu->getAddr() * cu->m_numPartitions],<br>

> +                &m_bestCU[0]->m_pic->m_intraData->modes[cu->getAddr() * cu->m_numPartitions]);<br>

<br>

</span>Pointer checking needs to be done at some point, probably at the frame<br>

level. If the user doesn't allocate a buffer, we shouldn't crash.<br></blockquote><div><br></div><div>included the pointer checking </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

We should probably also be setting the analysis pointers to NULL in the<br>

input picture structure prior to returning from x265_encoder_encode() so<br>

they do not accidentally re-use the same buffers for more than one<br>

picture.  In short, we need to be a lot more defensive about API abuses.<br></blockquote><div><br></div><div>i will make the separate patch for this, but still i need to verify on this, the analysis buffer is getting used to dump the analysis data into file</div><div>after x265_encoder_encode(), </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

> +        }<br>

> +        else<br>

> +        {<br>

> +            compressIntraCU(m_bestCU[0], m_tempCU[0], false, cu, cu->m_CULocalData);<br>

> +            if (m_param->analysisMode == 1)<br>

> +            {<br>

> +                memcpy(&m_bestCU[0]->m_pic->m_intraData->depth[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getDepth(), sizeof(uint8_t) * cu->getTotalNumPart());<br>

> +                memcpy(&m_bestCU[0]->m_pic->m_intraData->modes[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getLumaIntraDir(), sizeof(uint8_t) * cu->getTotalNumPart());<br>

> +                memcpy(&m_bestCU[0]->m_pic->m_intraData->partSizes[cu->getAddr() * cu->m_numPartitions], m_bestCU[0]->getPartitionSize(), sizeof(char) * cu->getTotalNumPart());<br>

> +                m_bestCU[0]->m_pic->m_intraData->cuAddr[cu->getAddr()] = cu->getAddr();<br>

> +                m_bestCU[0]->m_pic->m_intraData->poc[cu->getAddr()]    = cu->m_pic->m_POC;<br>

> +            }<br>

>          }<br>

>          if (m_param->bLogCuStats || m_param->rc.bStatWrite)<br>

>          {<br>

> @@ -533,7 +543,142 @@<br>

>  #endif<br>

>  }<br>

><br>

> -void Analysis::checkIntra(TComDataCU*& outBestCU, TComDataCU*& outTempCU, PartSize partSize, CU *cu)<br>

> +void Analysis::sharedCompressIntraCU(TComDataCU*& outBestCU, TComDataCU*& outTempCU, uint32_t depth, TComDataCU* cuPicsym, CU *cu, uint8_t* sharedDepth, char* sharedPartSizes, uint8_t* sharedModes)<br>

> +{<br>

> +    Frame* pic = outBestCU->m_pic;<br>

> +<br>

> +    // if current depth == shared depth then skip further splitting.<br>

> +    bool bSubBranch = true;<br>

> +<br>

> +    if (depth == 0)<br>

<br>

</span>!depth<br>

<span class=""><br>

> +    {<br>

> +        // offset to next best depth in sharedDepth buffer<br>

> +        m_zorder = 0;<br>

> +<br>

> +        // index to g_depthInc array to increment m_zorder offset to next depth<br>

> +        m_ctuToDepthIndex = m_param->maxCUSize / 22;<br>

<br>

</span>this math is pretty magical. my guess is there's already a table<br>

somewhere that does this more cleanly? Does this code work with<br>

--ctu 16?<br></blockquote><div><br></div><div>i have verified and i don't find any such a table, but this logic works well for ctu size 64, 32 and 16, verified on this  </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

> +        // get original YUV data from picture<br>

> +        m_origYuv[depth]->copyFromPicYuv(pic->getPicYuvOrg(), outBestCU->getAddr(), outBestCU->getZorderIdxInCU());<br>

> +    }<br>

> +    else<br>

> +        m_origYuv[0]->copyPartToYuv(m_origYuv[depth], outBestCU->getZorderIdxInCU());<br>

> +<br>

> +    Slice* slice = outTempCU->m_slice;<br>

> +    int32_t cu_split_flag = !(cu->flags & CU::LEAF);<br>

> +    int32_t cu_unsplit_flag = !(cu->flags & CU::SPLIT_MANDATORY);<br>

<br>

</span>It looks like this function is recursively encoding the entire I slice<br>

CTU. If that is the case the name should reflect that, perhaps<br>

compressSharedIntraCTU.<br></blockquote><div><br></div><div>yes i will change this function name  </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

> +    if (cu_unsplit_flag && ((m_zorder == outBestCU->getZorderIdxInCU()) && (depth == sharedDepth[m_zorder])))<br>

> +    {<br>

> +        m_quant.setQPforQuant(outTempCU);<br>

> +        checkIntra(outBestCU, outTempCU, (PartSize)sharedPartSizes[m_zorder], cu, &sharedModes[m_zorder]);<br>

> +<br>

> +        if (!(depth == g_maxCUDepth))<br>

> +        {<br>

> +            m_entropyCoder->resetBits();<br>

> +            m_entropyCoder->codeSplitFlag(outBestCU, 0, depth);<br>

> +            outBestCU->m_totalBits += m_entropyCoder->getNumberOfWrittenBits();<br>

> +        }<br>

> +        if (m_rdCost.m_psyRd)<br>

> +            outBestCU->m_totalPsyCost = m_rdCost.calcPsyRdCost(outBestCU->m_totalDistortion, outBestCU->m_totalBits, outBestCU->m_psyEnergy);<br>

> +        else<br>

> +            outBestCU->m_totalRDCost  = m_rdCost.calcRdCost(outBestCU->m_totalDistortion, outBestCU->m_totalBits);<br>

<br>

</span>How applicable is psy-rd for I slices in the shared re-use case?  Does<br>

it influence splits or something? If it's not being used, we should<br>

save the cycles<br>

<br>

Should we be measuring cost at all in the reuse case?<br>

<div><div class="h5"><br>

> +        bSubBranch = false;<br>

> +<br>

> +        // increment m_zorder offset to point to next best depth in sharedDepth buffer<br>

> +        m_zorder += g_depthInc[m_ctuToDepthIndex][sharedDepth[m_zorder]];<br>

> +    }<br>

> +<br>

> +    // copy original YUV samples in lossless mode<br>

> +    if (outBestCU->isLosslessCoded(0))<br>

> +        fillOrigYUVBuffer(outBestCU, m_origYuv[depth]);<br>

> +<br>

> +    // further split<br>

> +    if (cu_split_flag && bSubBranch)<br>

> +    {<br>

> +        uint32_t    nextDepth     = depth + 1;<br>

> +        TComDataCU* subBestPartCU = m_bestCU[nextDepth];<br>

> +        TComDataCU* subTempPartCU = m_tempCU[nextDepth];<br>

> +        for (uint32_t partUnitIdx = 0; partUnitIdx < 4; partUnitIdx++)<br>

> +        {<br>

> +            CU *child_cu = cuPicsym->m_CULocalData + cu->childIdx + partUnitIdx;<br>

> +<br>

> +            if (child_cu->flags & CU::PRESENT)<br>

> +            {<br>

> +                int32_t qp = outTempCU->getQP(0);<br>

> +                subBestPartCU->initSubCU(outTempCU, partUnitIdx, nextDepth, qp); // clear sub partition datas or init.<br>

> +                subTempPartCU->initSubCU(outTempCU, partUnitIdx, nextDepth, qp); // clear sub partition datas or init.<br>

> +                if (0 == partUnitIdx) //initialize RD with previous depth buffer<br>

> +                    m_rdEntropyCoders[nextDepth][CI_CURR_BEST].load(m_rdEntropyCoders[depth][CI_CURR_BEST]);<br>

> +                else<br>

> +                    m_rdEntropyCoders[nextDepth][CI_CURR_BEST].load(m_rdEntropyCoders[nextDepth][CI_NEXT_BEST]);<br>

<br></div></div></blockquote><div>Ok, i will fix all the remaining comments and resend </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

</div></div>we normally code this as:<br>

<br>

if (partUnitIdx) // initialize RD with previous depth buffer<br>

<span class="">    m_rdEntropyCoders[nextDepth][CI_CURR_BEST].load(m_rdEntropyCoders[nextDepth][CI_NEXT_BEST]);<br>

</span>else<br>

<span class="">    m_rdEntropyCoders[nextDepth][CI_CURR_BEST].load(m_rdEntropyCoders[depth][CI_CURR_BEST]);<br>

<br>

> +<br>

</span><div><div class="h5">> +                sharedCompressIntraCU(subBestPartCU, subTempPartCU, nextDepth, cuPicsym, child_cu, sharedDepth, sharedPartSizes, sharedModes);<br>

> +                outTempCU->copyPartFrom(subBestPartCU, partUnitIdx, nextDepth); // Keep best part data to current temporary data.<br>

> +<br>

> +                // check if cost ==  MAX_INT64 then current depth != sharedDepth so, current CU is not best CU<br>

> +                // set the cost to MAX_INT64 - 1 to mark it as not best CU<br>

> +                if (m_rdCost.m_psyRd && subBestPartCU->m_totalPsyCost == MAX_INT64)<br>

> +                    outTempCU->m_totalPsyCost = MAX_INT64 - 1;<br>

> +                else if(subBestPartCU->m_totalRDCost == MAX_INT64)<br>

> +                    outTempCU->m_totalRDCost = MAX_INT64 - 1;<br>

> +<br>

> +                copyYuv2Tmp(subBestPartCU->getTotalNumPart() * partUnitIdx, nextDepth);<br>

> +            }<br>

> +            else<br>

> +            {<br>

> +                subBestPartCU->copyToPic(nextDepth);<br>

> +                outTempCU->copyPartFrom(subBestPartCU, partUnitIdx, nextDepth);<br>

> +<br>

> +                // increment m_zorder offset to point to next best depth in sharedDepth buffer<br>

> +                m_zorder += g_depthInc[m_ctuToDepthIndex][sharedDepth[m_zorder]];<br>

> +            }<br>

> +        }<br>

> +        if (cu->flags & CU::PRESENT)<br>

> +        {<br>

> +            m_entropyCoder->resetBits();<br>

> +            m_entropyCoder->codeSplitFlag(outTempCU, 0, depth);<br>

> +            outTempCU->m_totalBits += m_entropyCoder->getNumberOfWrittenBits(); // split bits<br>

> +        }<br>

> +<br>

> +        // check if cost is greater than (MAX_INT64 - 1)<br>

> +        if (m_rdCost.m_psyRd && outTempCU->m_totalPsyCost >= MAX_INT64)<br>

> +            outTempCU->m_totalPsyCost = m_rdCost.calcPsyRdCost(outTempCU->m_totalDistortion, outTempCU->m_totalBits, outTempCU->m_psyEnergy);<br>

> +        else if (outTempCU->m_totalRDCost >= MAX_INT64)<br>

> +            outTempCU->m_totalRDCost = m_rdCost.calcRdCost(outTempCU->m_totalDistortion, outTempCU->m_totalBits);<br>

<br>

</div></div>Unrelated to this patch, but now that psy-rd is stable, we can do away<br>

with the separate cost variables. We should always be measuring rd cost<br>

or psy-rd cost, there's no reason to keep both variables.<br>

<div><div class="h5"><br>

> +        if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)<br>

> +        {<br>

> +            bool hasResidual = false;<br>

> +            for (uint32_t blkIdx = 0; blkIdx < outTempCU->getTotalNumPart(); blkIdx++)<br>

> +            {<br>

> +                if (outTempCU->getCbf(blkIdx, TEXT_LUMA) || outTempCU->getCbf(blkIdx, TEXT_CHROMA_U) ||<br>

> +                    outTempCU->getCbf(blkIdx, TEXT_CHROMA_V))<br>

> +                {<br>

> +                    hasResidual = true;<br>

> +                    break;<br>

> +                }<br>

> +            }<br>

> +<br>

> +            uint32_t targetPartIdx = 0;<br>

> +            if (hasResidual)<br>

> +            {<br>

> +                bool foundNonZeroCbf = false;<br>

> +                outTempCU->setQPSubCUs(outTempCU->getRefQP(targetPartIdx), outTempCU, 0, depth, foundNonZeroCbf);<br>

> +                X265_CHECK(foundNonZeroCbf, "expected to find non-zero CBF\n");<br>

> +            }<br>

> +            else<br>

> +                outTempCU->setQPSubParts(outTempCU->getRefQP(targetPartIdx), 0, depth); // set QP to default QP<br>

> +        }<br>

> +        m_rdEntropyCoders[nextDepth][CI_NEXT_BEST].store(m_rdEntropyCoders[depth][CI_TEMP_BEST]);<br>

> +        checkBestMode(outBestCU, outTempCU, depth);<br>

> +    }<br>

> +    outBestCU->copyToPic(depth);<br>

> +    copyYuv2Pic(pic, outBestCU->getAddr(), outBestCU->getZorderIdxInCU(), depth);<br>

> +}<br>

> +<br>

> +void Analysis::checkIntra(TComDataCU*& outBestCU, TComDataCU*& outTempCU, PartSize partSize, CU *cu, uint8_t* sharedModes)<br>

>  {<br>

>      //PPAScopeEvent(CheckRDCostIntra + depth);<br>

>      uint32_t depth = g_log2Size[m_param->maxCUSize] - cu->log2CUSize;<br>

> @@ -544,7 +689,10 @@<br>

>      uint32_t tuDepthRange[2];<br>

>      outTempCU->getQuadtreeTULog2MinSizeInCU(tuDepthRange, 0);<br>

><br>

> -    estIntraPredQT(outTempCU, m_origYuv[depth], m_tmpPredYuv[depth], m_tmpResiYuv[depth], m_tmpRecoYuv[depth], tuDepthRange);<br>

> +    if (sharedModes)<br>

> +        sharedIntraPredQT(outTempCU, m_origYuv[depth], m_tmpPredYuv[depth], m_tmpResiYuv[depth], m_tmpRecoYuv[depth], tuDepthRange, sharedModes);<br>

> +    else<br>

> +        estIntraPredQT(outTempCU, m_origYuv[depth], m_tmpPredYuv[depth], m_tmpResiYuv[depth], m_tmpRecoYuv[depth], tuDepthRange);<br>

><br>

>      estIntraPredChromaQT(outTempCU, m_origYuv[depth], m_tmpPredYuv[depth], m_tmpResiYuv[depth], m_tmpRecoYuv[depth]);<br>

><br>

> diff -r 184e56afa951 -r 9db768fa41ad source/encoder/analysis.h<br>

> --- a/source/encoder/analysis.h       Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/encoder/analysis.h       Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -100,6 +100,9 @@<br>

>      StatisticLog  m_sliceTypeLog[3];<br>

>      StatisticLog* m_log;<br>

><br>

> +    uint32_t      m_zorder;<br>

> +    uint32_t      m_ctuToDepthIndex;<br>

<br>

</div></div>it seems like these should be derivable from existing CU fields, or<br>

passed on the stack to sharedCompressIntraCU()<br>

<span class=""><br>

> +<br>

>      Analysis();<br>

>      bool create(uint32_t totalDepth, uint32_t maxWidth);<br>

>      void destroy();<br>

> @@ -110,7 +113,8 @@<br>

><br>

>      /* Warning: The interface for these functions will undergo significant changes as a major refactor is under progress */<br>

>      void compressIntraCU(TComDataCU*& outBestCU, TComDataCU*& outTempCU, uint32_t depth, TComDataCU* cuPicsym, CU *cu);<br>

> -    void checkIntra(TComDataCU*& outBestCU, TComDataCU*& outTempCU, PartSize partSize, CU *cu);<br>

> +    void checkIntra(TComDataCU*& outBestCU, TComDataCU*& outTempCU, PartSize partSize, CU *cu, uint8_t* sharedModes=NULL);<br>

<br>

</span>I don't generally like default args. please update all callers instead<br>

<span class=""><br>

> +    void sharedCompressIntraCU(TComDataCU*& outBestCU, TComDataCU*& outTempCU, uint32_t depth, TComDataCU* cuPicsym, CU *cu, uint8_t* sharedDepth, char* sharedPartSizes, uint8_t* sharedModes);<br>

><br>

>      void compressInterCU_rd0_4(TComDataCU*& outBestCU, TComDataCU*& outTempCU, TComDataCU* cu, uint32_t depth, TComDataCU* cuPicsym, CU *cu_t,<br>

>                                 int bInsidePicture, uint32_t partitionIndex, uint32_t minDepth);<br>

> diff -r 184e56afa951 -r 9db768fa41ad source/encoder/search.cpp<br>

> --- a/source/encoder/search.cpp       Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/encoder/search.cpp       Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -1484,6 +1484,75 @@<br>

>      x265_emms();<br>

>  }<br>

><br>

> +void Search::sharedIntraPredQT(TComDataCU* cu, TComYuv* fencYuv, TComYuv* predYuv, ShortYuv* resiYuv, TComYuv* reconYuv, uint32_t depthRange[2], uint8_t* sharedModes)<br>

> +{<br>

> +    uint32_t depth        = cu->getDepth(0);<br>

> +    uint32_t initTrDepth  = cu->getPartitionSize(0) == SIZE_2Nx2N ? 0 : 1;<br>

> +    uint32_t numPU        = 1 << (2 * initTrDepth);<br>

> +    uint32_t log2TrSize   = cu->getLog2CUSize(0) - initTrDepth;<br>

> +    uint32_t qNumParts    = cu->getTotalNumPart() >> 2;<br>

> +    uint32_t overallDistY = 0;<br>

> +    static const uint8_t intraModeNumFast[] = { 8, 8, 3, 3, 3 }; // 4x4, 8x8, 16x16, 32x32, 64x64<br>

<br>

</span>this array is unused<br>

<span class=""><br>

> +<br>

> +    // loop over partitions<br>

> +    uint32_t partOffset = 0;<br>

> +    uint32_t puDistY;<br>

> +    uint64_t puCost;<br>

> +    for (uint32_t pu = 0; pu < numPU; pu++, partOffset += qNumParts)<br>

> +    {<br>

> +        uint32_t bestPUMode = sharedModes[pu];<br>

> +        uint32_t bestPUDistY = 0;<br>

<br>

</span>these two variables both seem a bit redundant<br>

<div><div class="h5"><br>

> +        cu->setLumaIntraDirSubParts(bestPUMode, partOffset, depth + initTrDepth);<br>

> +<br>

> +        // set context models<br>

> +        m_entropyCoder->load(m_rdEntropyCoders[depth][CI_CURR_BEST]);<br>

> +<br>

> +        // determine residual for partition<br>

> +        puCost = 0;<br>

> +        puDistY = xRecurIntraCodingQT(cu, initTrDepth, partOffset, fencYuv, predYuv, resiYuv, true, puCost, depthRange);<br>

> +<br>

> +        bestPUDistY = puDistY;<br>

> +        xSetIntraResultQT(cu, initTrDepth, partOffset, reconYuv);<br>

> +<br>

> +        // update overall distortion<br>

> +        overallDistY += bestPUDistY;<br>

> +<br>

> +        if (pu != numPU - 1)<br>

> +        {<br>

> +            uint32_t zorder      = cu->getZorderIdxInCU() + partOffset;<br>

> +            pixel*   dst         = cu->m_pic->getPicYuvRec()->getLumaAddr(cu->getAddr(), zorder);<br>

> +            uint32_t dststride   = cu->m_pic->getPicYuvRec()->getStride();<br>

> +            pixel*   src         = reconYuv->getLumaAddr(partOffset);<br>

> +            uint32_t srcstride   = reconYuv->getStride();<br>

> +            primitives.luma_copy_pp[log2TrSize - 2](dst, dststride, src, srcstride);<br>

> +        }<br>

> +<br>

> +        // update PU data<br>

> +        cu->setLumaIntraDirSubParts(bestPUMode, partOffset, depth + initTrDepth);<br>

<br>

</div></div>is this call redundant?<br>

<span class=""><br>

> +        cu->copyToPic((uint8_t)depth, pu, initTrDepth);<br>

> +    }<br>

> +<br>

> +    if (numPU > 1)<br>

> +    {<br>

> +        // set Cbf for all blocks<br>

> +        uint32_t combCbfY = 0;<br>

> +        uint32_t partIdx  = 0;<br>

> +        for (uint32_t part = 0; part < 4; part++, partIdx += qNumParts)<br>

> +            combCbfY |= cu->getCbf(partIdx, TEXT_LUMA,     1);<br>

> +<br>

> +        for (uint32_t offs = 0; offs < 4 * qNumParts; offs++)<br>

> +            cu->getCbf(TEXT_LUMA)[offs] |= combCbfY;<br>

> +<br>

<br>

</span>white-space<br>

<span class=""><br>

> +    }<br>

> +<br>

> +    // reset context models<br>

> +    m_entropyCoder->load(m_rdEntropyCoders[depth][CI_CURR_BEST]);<br>

> +<br>

> +    // set distortion (rate and r-d costs are determined later)<br>

> +    cu->m_totalDistortion = overallDistY;<br>

<br>

</span>cu->m_totalDistortion could be updated within the loop directly<br>

<span class=""><br>

> +}<br>

> +<br>

>  void Search::getBestIntraModeChroma(TComDataCU* cu, TComYuv* fencYuv, TComYuv* predYuv)<br>

>  {<br>

>      uint32_t depth   = cu->getDepth(0);<br>

> diff -r 184e56afa951 -r 9db768fa41ad source/encoder/search.h<br>

> --- a/source/encoder/search.h Fri Sep 12 12:02:46 2014 +0530<br>

> +++ b/source/encoder/search.h Mon Sep 15 14:07:31 2014 +0530<br>

> @@ -109,6 +109,7 @@<br>

>      bool initSearch(x265_param *param, ScalingList& scalingList);<br>

><br>

>      void estIntraPredQT(TComDataCU* cu, TComYuv* fencYuv, TComYuv* predYuv, ShortYuv* resiYuv, TComYuv* reconYuv, uint32_t depthRange[2]);<br>

> +    void sharedIntraPredQT(TComDataCU* cu, TComYuv* fencYuv, TComYuv* predYuv, ShortYuv* resiYuv, TComYuv* reconYuv, uint32_t depthRange[2], uint8_t* sharedModes);<br>

>      void estIntraPredChromaQT(TComDataCU* cu, TComYuv* fencYuv, TComYuv* predYuv, ShortYuv* resiYuv, TComYuv* reconYuv);<br>

><br>

>      // estimation inter prediction (non-skip)<br>

</span>> _______________________________________________<br>

> x265-devel mailing list<br>

> <a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>

> <a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Steve Borho<br>

_______________________________________________<br>

x265-devel mailing list<br>

<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>

<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>Thanks & Regards<br>Gopu G<br>Multicoreware Inc <br><br>

</div></div>