<div dir="ltr"><b>Performance results are as below:</b><div><b>Before applying patch - </b></div><div><div>x265.exe C:\testsequences\BasketballDrive_1920x1080_50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1 --preset veryslow</div><div>encoded 500 frames in 552.55s (0.90 fps), 2511.21 kb/s<br></div><div><br></div><div>x265.exe C:\testsequences\BasketballDrive_1920x1080_50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1</div><div>encoded 500 frames in 42.20s (11.85 fps), 2819.98 kb/s</div><div><br></div><div>x265.exe C:\testsequences\park_joy_1080p50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1 --preset veryslow</div><div>encoded 500 frames in 796.85s (0.63 fps), 9471.87 kb/s<br></div><div><br></div><div>x265.exe C:\testsequences\park_joy_1080p50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1</div><div>encoded 500 frames in 49.38s (10.13 fps), 10368.95 kb/s</div></div><div><br></div><div><b>After applying patch - </b></div><div><div>x265.exe C:\testsequences\BasketballDrive_1920x1080_50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1 --preset veryslow</div><div>encoded 500 frames in 550.07s (0.91 fps), 2511.21 kb/s<br></div><div><br></div><div>x265.exe C:\testsequences\BasketballDrive_1920x1080_50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1</div><div>encoded 500 frames in 41.74s (11.98 fps), 2819.98 kb/s</div><div><br></div><div>x265.exe C:\testsequences\park_joy_1080p50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1 --preset veryslow</div><div>encoded 500 frames in 786.16s (0.64 fps), 9471.87 kb/s<br></div><div><br></div><div>x265.exe C:\testsequences\park_joy_1080p50.y4m -f 500 -o test.hevc -r recon.y4m --hash 1</div><div>encoded 500 frames in 49.22s (10.16 fps), 10368.95 kb/s</div></div><div><br></div><div>There is not much performance gain. But I avoided repeated function calls to find whether the above, left CUs are available or not for the current CU.</div><div>And cached the predicted MVs before entering the loop to find the best MVs. I have sent the patch after modification for review.    </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 9, 2015 at 11:06 PM, Steve Borho <span dir="ltr"><<a href="mailto:steve@borho.org" target="_blank">steve@borho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 03/09, <a href="mailto:ashok@multicorewareinc.com">ashok@multicorewareinc.com</a> wrote:<br>
> # HG changeset patch<br>
> # User Ashok Kumar Mishra<<a href="mailto:ashok@multicorewareinc.com">ashok@multicorewareinc.com</a>><br>
> # Date 1425891920 -19800<br>
> #      Mon Mar 09 14:35:20 2015 +0530<br>
> # Node ID a000ce54141021ef154d1a2942e64667768303cb<br>
> # Parent  043c2418864b0a3ada6f597e6def6ead73d90b5f<br>
> Cache spatial and temporal PMVs before analyzing to find the best MV for each reference index<br>
<br>
</span>Looks like an interesting optimization; did you try measuring<br>
performance changes?<br>
<div><div class="h5"><br>
> diff -r 043c2418864b -r a000ce541410 source/common/cudata.cpp<br>
> --- a/source/common/cudata.cpp        Fri Mar 06 13:15:55 2015 -0600<br>
> +++ b/source/common/cudata.cpp        Mon Mar 09 14:35:20 2015 +0530<br>
> @@ -1632,87 +1632,122 @@<br>
>      return count;<br>
>  }<br>
><br>
> -/* Constructs a list of candidates for AMVP, and a larger list of motion candidates */<br>
> -int CUData::fillMvpCand(uint32_t puIdx, uint32_t absPartIdx, int picList, int refIdx, MV* amvpCand, MV* mvc) const<br>
> +// Create the PMV list. Called for each reference index.<br>
> +int CUData::getPMV(InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx, MV* amvpCand, MV* pmv) const<br>
>  {<br>
> +    MV directMV[MD_ABOVE_LEFT + 1];<br>
> +    MV indirectMV[MD_ABOVE_LEFT + 1];<br>
> +    bool validDirect[MD_ABOVE_LEFT + 1];<br>
> +    bool validIndirect[MD_ABOVE_LEFT + 1];<br>
> +<br>
> +    // Left candidate.<br>
> +    validDirect[MD_BELOW_LEFT]  = getDirectPMV(directMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);<br>
> +    validDirect[MD_LEFT]        = getDirectPMV(directMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);<br>
> +    // Top candidate.<br>
> +    validDirect[MD_ABOVE_RIGHT] = getDirectPMV(directMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);<br>
> +    validDirect[MD_ABOVE]       = getDirectPMV(directMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);<br>
> +    validDirect[MD_ABOVE_LEFT]  = getDirectPMV(directMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);<br>
> +<br>
> +    // Left candidate.<br>
> +    validIndirect[MD_BELOW_LEFT]  = getIndirectPMV(indirectMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);<br>
> +    validIndirect[MD_LEFT]        = getIndirectPMV(indirectMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);<br>
> +    // Top candidate.<br>
> +    validIndirect[MD_ABOVE_RIGHT] = getIndirectPMV(indirectMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);<br>
> +    validIndirect[MD_ABOVE]       = getIndirectPMV(indirectMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);<br>
> +    validIndirect[MD_ABOVE_LEFT]  = getIndirectPMV(indirectMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);<br>
> +<br>
>      int num = 0;<br>
> -<br>
> -    // spatial MV<br>
> -    uint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);<br>
> -<br>
> -    deriveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);<br>
> -<br>
> -    MV mv[MD_ABOVE_LEFT + 1];<br>
> -    MV mvOrder[MD_ABOVE_LEFT + 1];<br>
> -    bool valid[MD_ABOVE_LEFT + 1];<br>
> -    bool validOrder[MD_ABOVE_LEFT + 1];<br>
> -<br>
> -    valid[MD_BELOW_LEFT]  = addMVPCand(mv[MD_BELOW_LEFT], picList, refIdx, partIdxLB, MD_BELOW_LEFT);<br>
> -    valid[MD_LEFT]        = addMVPCand(mv[MD_LEFT], picList, refIdx, partIdxLB, MD_LEFT);<br>
> -    valid[MD_ABOVE_RIGHT] = addMVPCand(mv[MD_ABOVE_RIGHT], picList, refIdx, partIdxRT, MD_ABOVE_RIGHT);<br>
> -    valid[MD_ABOVE]       = addMVPCand(mv[MD_ABOVE], picList, refIdx, partIdxRT, MD_ABOVE);<br>
> -    valid[MD_ABOVE_LEFT]  = addMVPCand(mv[MD_ABOVE_LEFT], picList, refIdx, partIdxLT, MD_ABOVE_LEFT);<br>
> -<br>
> -    validOrder[MD_BELOW_LEFT]  = addMVPCandOrder(mvOrder[MD_BELOW_LEFT], picList, refIdx, partIdxLB, MD_BELOW_LEFT);<br>
> -    validOrder[MD_LEFT]        = addMVPCandOrder(mvOrder[MD_LEFT], picList, refIdx, partIdxLB, MD_LEFT);<br>
> -    validOrder[MD_ABOVE_RIGHT] = addMVPCandOrder(mvOrder[MD_ABOVE_RIGHT], picList, refIdx, partIdxRT, MD_ABOVE_RIGHT);<br>
> -    validOrder[MD_ABOVE]       = addMVPCandOrder(mvOrder[MD_ABOVE], picList, refIdx, partIdxRT, MD_ABOVE);<br>
> -    validOrder[MD_ABOVE_LEFT]  = addMVPCandOrder(mvOrder[MD_ABOVE_LEFT], picList, refIdx, partIdxLT, MD_ABOVE_LEFT);<br>
> -<br>
>      // Left predictor search<br>
> -    if (valid[MD_BELOW_LEFT])<br>
> -        amvpCand[num++] = mv[MD_BELOW_LEFT];<br>
> -    else if (valid[MD_LEFT])<br>
> -        amvpCand[num++] = mv[MD_LEFT];<br>
> -    else if (validOrder[MD_BELOW_LEFT])<br>
> -        amvpCand[num++] = mvOrder[MD_BELOW_LEFT];<br>
> -    else if (validOrder[MD_LEFT])<br>
> -        amvpCand[num++] = mvOrder[MD_LEFT];<br>
> +    if (validDirect[MD_BELOW_LEFT])<br>
> +        amvpCand[num++] = directMV[MD_BELOW_LEFT];<br>
> +    else if (validDirect[MD_LEFT])<br>
> +        amvpCand[num++] = directMV[MD_LEFT];<br>
> +    else if (validIndirect[MD_BELOW_LEFT])<br>
> +        amvpCand[num++] = indirectMV[MD_BELOW_LEFT];<br>
> +    else if (validIndirect[MD_LEFT])<br>
> +        amvpCand[num++] = indirectMV[MD_LEFT];<br>
><br>
>      bool bAddedSmvp = num > 0;<br>
><br>
>      // Above predictor search<br>
> -    if (valid[MD_ABOVE_RIGHT])<br>
> -        amvpCand[num++] = mv[MD_ABOVE_RIGHT];<br>
> -    else if (valid[MD_ABOVE])<br>
> -        amvpCand[num++] = mv[MD_ABOVE];<br>
> -    else if (valid[MD_ABOVE_LEFT])<br>
> -        amvpCand[num++] = mv[MD_ABOVE_LEFT];<br>
> +    if (validDirect[MD_ABOVE_RIGHT])<br>
> +        amvpCand[num++] = directMV[MD_ABOVE_RIGHT];<br>
> +    else if (validDirect[MD_ABOVE])<br>
> +        amvpCand[num++] = directMV[MD_ABOVE];<br>
> +    else if (validDirect[MD_ABOVE_LEFT])<br>
> +        amvpCand[num++] = directMV[MD_ABOVE_LEFT];<br>
><br>
>      if (!bAddedSmvp)<br>
>      {<br>
> -        if (validOrder[MD_ABOVE_RIGHT])<br>
> -            amvpCand[num++] = mvOrder[MD_ABOVE_RIGHT];<br>
> -        else if (validOrder[MD_ABOVE])<br>
> -            amvpCand[num++] = mvOrder[MD_ABOVE];<br>
> -        else if (validOrder[MD_ABOVE_LEFT])<br>
> -            amvpCand[num++] = mvOrder[MD_ABOVE_LEFT];<br>
> +        if (validIndirect[MD_ABOVE_RIGHT])<br>
> +            amvpCand[num++] = indirectMV[MD_ABOVE_RIGHT];<br>
> +        else if (validIndirect[MD_ABOVE])<br>
> +            amvpCand[num++] = indirectMV[MD_ABOVE];<br>
> +        else if (validIndirect[MD_ABOVE_LEFT])<br>
> +            amvpCand[num++] = indirectMV[MD_ABOVE_LEFT];<br>
>      }<br>
><br>
>      int numMvc = 0;<br>
>      for (int dir = MD_LEFT; dir <= MD_ABOVE_LEFT; dir++)<br>
>      {<br>
> -        if (valid[dir] && mv[dir].notZero())<br>
> -            mvc[numMvc++] = mv[dir];<br>
> +        if (validDirect[dir] && directMV[dir].notZero())<br>
> +            pmv[numMvc++] = directMV[dir];<br>
><br>
> -        if (validOrder[dir] && mvOrder[dir].notZero())<br>
> -            mvc[numMvc++] = mvOrder[dir];<br>
> +        if (validIndirect[dir] && indirectMV[dir].notZero())<br>
> +            pmv[numMvc++] = indirectMV[dir];<br>
>      }<br>
><br>
>      if (num == 2)<br>
> +        num -= amvpCand[0] == amvpCand[1];<br>
> +<br>
> +    // Get the collocated candidate. At this step, either the first candidate<br>
> +    // was found or its value is 0.<br>
> +    if (m_slice->m_sps->bTemporalMVPEnabled && num < 2)<br>
>      {<br>
> -        if (amvpCand[0] == amvpCand[1])<br>
> -            num = 1;<br>
> -        else<br>
> -            /* AMVP_NUM_CANDS = 2 */<br>
> -            return numMvc;<br>
> +        int tempRefIdx = neighbours[MD_COLLOCATED].refIdx[picList];<br>
> +        if (tempRefIdx != -1)<br>
> +        {<br>
> +            uint32_t cuAddr = neighbours[MD_COLLOCATED].cuAddr[picList];<br>
> +            const Frame* colPic = m_slice->m_refPicList[m_slice->isInterB() && !m_slice->m_colFromL0Flag][m_slice->m_colRefIdx];<br>
> +            const CUData* colCU = colPic->m_encData->getPicCTU(cuAddr);<br>
> +<br>
> +            // Scale the vector<br>
> +            int colRefPOC = colCU->m_slice->m_refPOCList[tempRefIdx >> 4][tempRefIdx & 0xf];<br>
> +            int colPOC = colCU->m_slice->m_poc;<br>
> +<br>
> +            int curRefPOC = m_slice->m_refPOCList[picList][refIdx];<br>
> +            int curPOC = m_slice->m_poc;<br>
> +<br>
> +            pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);<br>
> +        }<br>
>      }<br>
><br>
> +    while (num < AMVP_NUM_CANDS)<br>
> +        amvpCand[num++] = 0;<br>
> +<br>
> +    return numMvc;<br>
> +}<br>
> +<br>
> +/* Constructs a list of candidates for AMVP, and a larger list of motion candidates */<br>
> +void CUData::getNeighbourMV(uint32_t puIdx, uint32_t absPartIdx, InterNeighbourMV* neighbours) const<br>
> +{<br>
> +    // Set the temporal neighbour to unavailable by default.<br>
> +    neighbours[MD_COLLOCATED].unifiedRef = -1;<br>
> +<br>
> +    uint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);<br>
> +    deriveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);<br>
> +<br>
> +    // Load the spatial MVs.<br>
> +    getInterNeighbourMV(neighbours + MD_BELOW_LEFT, partIdxLB, MD_BELOW_LEFT);<br>
> +    getInterNeighbourMV(neighbours + MD_LEFT,       partIdxLB, MD_LEFT);<br>
> +    getInterNeighbourMV(neighbours + MD_ABOVE_RIGHT,partIdxRT, MD_ABOVE_RIGHT);<br>
> +    getInterNeighbourMV(neighbours + MD_ABOVE,      partIdxRT, MD_ABOVE);<br>
> +    getInterNeighbourMV(neighbours + MD_ABOVE_LEFT, partIdxLT, MD_ABOVE_LEFT);<br>
> +<br>
>      if (m_slice->m_sps->bTemporalMVPEnabled)<br>
>      {<br>
>          uint32_t absPartAddr = m_absIdxInCTU + absPartIdx;<br>
>          uint32_t partIdxRB = deriveRightBottomIdx(puIdx);<br>
> -        MV colmv;<br>
><br>
>          // co-located RightBottom temporal predictor (H)<br>
>          int ctuIdx = -1;<br>
> @@ -1741,45 +1776,17 @@<br>
>              else // is the right bottom corner of CTU<br>
>                  absPartAddr = 0;<br>
>          }<br>
> -        if (ctuIdx >= 0 && getColMVP(colmv, refIdx, picList, ctuIdx, absPartAddr))<br>
> -        {<br>
> -            amvpCand[num++] = colmv;<br>
> -            mvc[numMvc++] = colmv;<br>
> -        }<br>
> -        else<br>
> +<br>
> +        if (!(ctuIdx >= 0 && getCollocatedMV(ctuIdx, absPartAddr, neighbours + MD_COLLOCATED)))<br>
>          {<br>
>              uint32_t partIdxCenter =  deriveCenterIdx(puIdx);<br>
>              uint32_t curCTUIdx = m_cuAddr;<br>
> -            if (getColMVP(colmv, refIdx, picList, curCTUIdx, partIdxCenter))<br>
> -            {<br>
> -                amvpCand[num++] = colmv;<br>
> -                mvc[numMvc++] = colmv;<br>
> -            }<br>
> +            getCollocatedMV(curCTUIdx, partIdxCenter, neighbours + MD_COLLOCATED);<br>
>          }<br>
>      }<br>
> -<br>
> -    while (num < AMVP_NUM_CANDS)<br>
> -        amvpCand[num++] = 0;<br>
> -<br>
> -    return numMvc;<br>
>  }<br>
><br>
> -void CUData::clipMv(MV& outMV) const<br>
> -{<br>
> -    const uint32_t mvshift = 2;<br>
> -    uint32_t offset = 8;<br>
> -<br>
> -    int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);<br>
> -    int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);<br>
> -<br>
> -    int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);<br>
> -    int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);<br>
> -<br>
> -    outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));<br>
> -    outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));<br>
> -}<br>
> -<br>
> -bool CUData::addMVPCand(MV& mvp, int picList, int refIdx, uint32_t partUnitIdx, MVP_DIR dir) const<br>
> +void CUData::getInterNeighbourMV(InterNeighbourMV *neighbour, uint32_t partUnitIdx, MVP_DIR dir) const<br>
>  {<br>
>      const CUData* tmpCU = NULL;<br>
>      uint32_t idx = 0;<br>
> @@ -1802,103 +1809,77 @@<br>
>          tmpCU = getPUAboveLeft(idx, partUnitIdx);<br>
>          break;<br>
>      default:<br>
> -        return false;<br>
> +        break;<br>
>      }<br>
><br>
>      if (!tmpCU)<br>
> -        return false;<br>
> -<br>
> -    int refPOC = m_slice->m_refPOCList[picList][refIdx];<br>
> -    int partRefIdx = tmpCU->m_refIdx[picList][idx];<br>
> -    if (partRefIdx >= 0 && refPOC == tmpCU->m_slice->m_refPOCList[picList][partRefIdx])<br>
>      {<br>
> -        mvp = tmpCU->m_mv[picList][idx];<br>
> -        return true;<br>
> +        // Mark the PMV as unavailable.<br>
> +        for (int i = 0; i < 2; i++)<br>
> +            neighbour->refIdx[i] = -1;<br>
> +        return;<br>
>      }<br>
><br>
> -    int refPicList2nd = 0;<br>
> -    if (picList == 0)<br>
> -        refPicList2nd = 1;<br>
> -    else if (picList == 1)<br>
> -        refPicList2nd = 0;<br>
> +    for (int i = 0; i < 2; i++)<br>
> +    {<br>
> +        // Get the MV.<br>
> +        neighbour->mv[i] = tmpCU->m_mv[i][idx];<br>
><br>
> +        // Get the reference idx.<br>
> +        neighbour->refIdx[i] = tmpCU->m_refIdx[i][idx];<br>
> +    }<br>
> +}<br>
> +<br>
> +void CUData::clipMv(MV& outMV) const<br>
> +{<br>
> +    const uint32_t mvshift = 2;<br>
> +    uint32_t offset = 8;<br>
> +<br>
> +    int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);<br>
> +    int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);<br>
> +<br>
> +    int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);<br>
> +    int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);<br>
> +<br>
> +    outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));<br>
> +    outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));<br>
> +}<br>
> +<br>
> +// Load direct spatial MV if available.<br>
> +bool CUData::getDirectPMV(MV& pmv, InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx) const<br>
> +{<br>
>      int curRefPOC = m_slice->m_refPOCList[picList][refIdx];<br>
> -    int neibRefPOC;<br>
> -<br>
> -    partRefIdx = tmpCU->m_refIdx[refPicList2nd][idx];<br>
> -    if (partRefIdx >= 0)<br>
> +    for (int i = 0; i < 2; i++, picList = !picList)<br>
>      {<br>
> -        neibRefPOC = tmpCU->m_slice->m_refPOCList[refPicList2nd][partRefIdx];<br>
> -        if (neibRefPOC == curRefPOC)<br>
> +        int partRefIdx = neighbours->refIdx[picList];<br>
> +        if (partRefIdx >= 0 && curRefPOC == m_slice->m_refPOCList[picList][partRefIdx])<br>
>          {<br>
> -            // Same reference frame but different list<br>
> -            mvp = tmpCU->m_mv[refPicList2nd][idx];<br>
> +            pmv = neighbours->mv[picList];<br>
>              return true;<br>
>          }<br>
>      }<br>
>      return false;<br>
>  }<br>
><br>
> -bool CUData::addMVPCandOrder(MV& outMV, int picList, int refIdx, uint32_t partUnitIdx, MVP_DIR dir) const<br>
> +// Load indirect spatial MV if available. An indirect MV has to be scaled.<br>
> +bool CUData::getIndirectPMV(MV& outMV, InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx) const<br>
>  {<br>
> -    const CUData* tmpCU = NULL;<br>
> -    uint32_t idx = 0;<br>
> +    int curPOC = m_slice->m_poc;<br>
> +    int neibPOC = curPOC;<br>
> +    int curRefPOC = m_slice->m_refPOCList[picList][refIdx];<br>
><br>
> -    switch (dir)<br>
> +    for (int i = 0; i < 2; i++, picList = !picList)<br>
>      {<br>
> -    case MD_LEFT:<br>
> -        tmpCU = getPULeft(idx, partUnitIdx);<br>
> -        break;<br>
> -    case MD_ABOVE:<br>
> -        tmpCU = getPUAbove(idx, partUnitIdx);<br>
> -        break;<br>
> -    case MD_ABOVE_RIGHT:<br>
> -        tmpCU = getPUAboveRight(idx, partUnitIdx);<br>
> -        break;<br>
> -    case MD_BELOW_LEFT:<br>
> -        tmpCU = getPUBelowLeft(idx, partUnitIdx);<br>
> -        break;<br>
> -    case MD_ABOVE_LEFT:<br>
> -        tmpCU = getPUAboveLeft(idx, partUnitIdx);<br>
> -        break;<br>
> -    default:<br>
> -        return false;<br>
> +        int partRefIdx = neighbours->refIdx[picList];<br>
> +        if (partRefIdx >= 0)<br>
> +        {<br>
> +            int neibRefPOC = m_slice->m_refPOCList[picList][partRefIdx];<br>
> +            MV mvp = neighbours->mv[picList];<br>
> +<br>
> +            outMV = scaleMvByPOCDist(mvp, curPOC, curRefPOC, neibPOC, neibRefPOC);<br>
> +            return true;<br>
> +        }<br>
>      }<br>
> -<br>
> -    if (!tmpCU)<br>
> -        return false;<br>
> -<br>
> -    int refPicList2nd = 0;<br>
> -    if (picList == 0)<br>
> -        refPicList2nd = 1;<br>
> -    else if (picList == 1)<br>
> -        refPicList2nd = 0;<br>
> -<br>
> -    int curPOC = m_slice->m_poc;<br>
> -    int curRefPOC = m_slice->m_refPOCList[picList][refIdx];<br>
> -    int neibPOC = curPOC;<br>
> -    int neibRefPOC;<br>
> -<br>
> -    int partRefIdx = tmpCU->m_refIdx[picList][idx];<br>
> -    if (partRefIdx >= 0)<br>
> -    {<br>
> -        neibRefPOC = tmpCU->m_slice->m_refPOCList[picList][partRefIdx];<br>
> -        MV mvp = tmpCU->m_mv[picList][idx];<br>
> -<br>
> -        scaleMvByPOCDist(outMV, mvp, curPOC, curRefPOC, neibPOC, neibRefPOC);<br>
> -        return true;<br>
> -    }<br>
> -<br>
> -    partRefIdx = tmpCU->m_refIdx[refPicList2nd][idx];<br>
> -    if (partRefIdx >= 0)<br>
> -    {<br>
> -        neibRefPOC = tmpCU->m_slice->m_refPOCList[refPicList2nd][partRefIdx];<br>
> -        MV mvp = tmpCU->m_mv[refPicList2nd][idx];<br>
> -<br>
> -        scaleMvByPOCDist(outMV, mvp, curPOC, curRefPOC, neibPOC, neibRefPOC);<br>
> -        return true;<br>
> -    }<br>
> -<br>
>      return false;<br>
>  }<br>
><br>
> @@ -1936,24 +1917,52 @@<br>
>      int curRefPOC = m_slice->m_refPOCList[picList][outRefIdx];<br>
>      int curPOC = m_slice->m_poc;<br>
><br>
> -    scaleMvByPOCDist(outMV, colmv, curPOC, curRefPOC, colPOC, colRefPOC);<br>
> +    outMV = scaleMvByPOCDist(colmv, curPOC, curRefPOC, colPOC, colRefPOC);<br>
>      return true;<br>
>  }<br>
><br>
> -void CUData::scaleMvByPOCDist(MV& outMV, const MV& inMV, int curPOC, int curRefPOC, int colPOC, int colRefPOC) const<br>
> +// Cache the collocated MV.<br>
> +bool CUData::getCollocatedMV(int cuAddr, int partUnitIdx, InterNeighbourMV *neighbour) const<br>
> +{<br>
> +    const Frame* colPic = m_slice->m_refPicList[m_slice->isInterB() && !m_slice->m_colFromL0Flag][m_slice->m_colRefIdx];<br>
> +    const CUData* colCU = colPic->m_encData->getPicCTU(cuAddr);<br>
> +<br>
> +    uint32_t absPartAddr = partUnitIdx & TMVP_UNIT_MASK;<br>
> +    if (colCU->m_predMode[partUnitIdx] == MODE_NONE || colCU->isIntra(absPartAddr))<br>
> +        return false;<br>
> +<br>
> +    for (int list = 0; list < 2; list++)<br>
> +    {<br>
> +        neighbour->cuAddr[list] = cuAddr;<br>
> +        int colRefPicList = m_slice->m_bCheckLDC ? list : m_slice->m_colFromL0Flag;<br>
> +        int colRefIdx = colCU->m_refIdx[colRefPicList][absPartAddr];<br>
> +<br>
> +        if (colRefIdx < 0)<br>
> +            colRefPicList = !colRefPicList;<br>
> +<br>
> +        neighbour->refIdx[list] = colCU->m_refIdx[colRefPicList][absPartAddr];<br>
> +        neighbour->refIdx[list] |= colRefPicList << 4;<br>
> +<br>
> +        neighbour->mv[list] = colCU->m_mv[colRefPicList][absPartAddr];<br>
> +    }<br>
> +<br>
> +    return neighbour->unifiedRef != -1;<br>
> +}<br>
> +<br>
> +MV CUData::scaleMvByPOCDist(const MV& inMV, int curPOC, int curRefPOC, int colPOC, int colRefPOC) const<br>
>  {<br>
>      int diffPocD = colPOC - colRefPOC;<br>
>      int diffPocB = curPOC - curRefPOC;<br>
><br>
>      if (diffPocD == diffPocB)<br>
> -        outMV = inMV;<br>
> +        return inMV;<br>
>      else<br>
>      {<br>
>          int tdb   = x265_clip3(-128, 127, diffPocB);<br>
>          int tdd   = x265_clip3(-128, 127, diffPocD);<br>
>          int x     = (0x4000 + abs(tdd / 2)) / tdd;<br>
>          int scale = x265_clip3(-4096, 4095, (tdb * x + 32) >> 6);<br>
> -        outMV = scaleMv(inMV, scale);<br>
> +        return scaleMv(inMV, scale);<br>
>      }<br>
>  }<br>
><br>
> diff -r 043c2418864b -r a000ce541410 source/common/cudata.h<br>
> --- a/source/common/cudata.h  Fri Mar 06 13:15:55 2015 -0600<br>
> +++ b/source/common/cudata.h  Mon Mar 09 14:35:20 2015 +0530<br>
> @@ -64,7 +64,8 @@<br>
>      MD_ABOVE,       // MVP of above block<br>
>      MD_ABOVE_RIGHT, // MVP of above right block<br>
>      MD_BELOW_LEFT,  // MVP of below left block<br>
> -    MD_ABOVE_LEFT   // MVP of above left block<br>
> +    MD_ABOVE_LEFT,  // MVP of above left block<br>
> +    MD_COLLOCATED   // MVP of temporal neighbour<br>
>  };<br>
><br>
>  struct CUGeom<br>
> @@ -94,6 +95,26 @@<br>
>      int refIdx;<br>
>  };<br>
><br>
> +// Structure that keeps the neighbour's MV information.<br>
> +struct InterNeighbourMV<br>
> +{<br>
> +    // Neighbour MV. The index represents the list.<br>
> +    MV mv[2];<br>
> +<br>
> +    // Collocated right bottom CU addr.<br>
> +    uint32_t cuAddr[2];<br>
> +<br>
> +    // For spatial prediction, this field contains the reference index<br>
> +    // in each list (-1 if not available).<br>
> +    //<br>
> +    // For temporal prediction, the first value is used for the<br>
> +    // prediction with list 0. The second value is used for the prediction<br>
> +    // with list 1. For each value, the first four bits are the reference index<br>
> +    // associated to the PMV, and the fifth bit is the list associated to the PMV.<br>
> +    // if both reference indices are -1, then unifiedRef is also -1<br>
> +    union { int16_t refIdx[2]; int32_t unifiedRef; };<br>
> +};<br>
> +<br>
>  typedef void(*cucopy_t)(uint8_t* dst, uint8_t* src); // dst and src are aligned to MIN(size, 32)<br>
>  typedef void(*cubcast_t)(uint8_t* dst, uint8_t val); // dst is aligned to MIN(size, 32)<br>
><br>
> @@ -197,7 +218,8 @@<br>
>      int8_t   getRefQP(uint32_t currAbsIdxInCTU) const;<br>
>      uint32_t getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MVField (*candMvField)[2], uint8_t* candDir) const;<br>
>      void     clipMv(MV& outMV) const;<br>
> -    int      fillMvpCand(uint32_t puIdx, uint32_t absPartIdx, int picList, int refIdx, MV* amvpCand, MV* mvc) const;<br>
> +    int      getPMV(InterNeighbourMV *neighbours, uint32_t reference_list, uint32_t refIdx, MV* amvpCand, MV* pmv) const;<br>
> +    void     getNeighbourMV(uint32_t puIdx, uint32_t absPartIdx, InterNeighbourMV* neighbours) const;<br>
>      void     getIntraTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;<br>
>      void     getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;<br>
><br>
> @@ -244,12 +266,16 @@<br>
>      bool isDiffMER(int xN, int yN, int xP, int yP) const { return ((xN >> 2) != (xP >> 2)) || ((yN >> 2) != (yP >> 2)); }<br>
><br>
>      // add possible motion vector predictor candidates<br>
> -    bool addMVPCand(MV& mvp, int picList, int refIdx, uint32_t absPartIdx, MVP_DIR dir) const;<br>
> -    bool addMVPCandOrder(MV& mvp, int picList, int refIdx, uint32_t absPartIdx, MVP_DIR dir) const;<br>
> +//    bool addMVPCand(MV& mvp, int picList, int refIdx, uint32_t absPartIdx, MVP_DIR dir) const;<br>
<br>
</div></div>please delete lines when sending the final patch, I don't generally push<br>
patches with commented lines of code in them.<br>
<div><div class="h5"><br>
> +    bool getDirectPMV(MV& pmv, InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx) const;<br>
> +    bool getIndirectPMV(MV& outMV, InterNeighbourMV *neighbours, uint32_t reference_list, uint32_t refIdx) const;<br>
> +//    bool addMVPCandOrder(MV& mvp, int picList, int refIdx, uint32_t absPartIdx, MVP_DIR dir) const;<br>
> +    void getInterNeighbourMV(InterNeighbourMV *neighbour, uint32_t partUnitIdx, MVP_DIR dir) const;<br>
><br>
>      bool getColMVP(MV& outMV, int& outRefIdx, int picList, int cuAddr, int absPartIdx) const;<br>
> +    bool getCollocatedMV(int cuAddr, int partUnitIdx, InterNeighbourMV *neighbour) const;<br>
><br>
> -    void scaleMvByPOCDist(MV& outMV, const MV& inMV, int curPOC, int curRefPOC, int colPOC, int colRefPOC) const;<br>
> +    MV scaleMvByPOCDist(const MV& inMV, int curPOC, int curRefPOC, int colPOC, int colRefPOC) const;<br>
><br>
>      void     deriveLeftRightTopIdx(uint32_t puIdx, uint32_t& partIdxLT, uint32_t& partIdxRT) const;<br>
><br>
> diff -r 043c2418864b -r a000ce541410 source/encoder/search.cpp<br>
> --- a/source/encoder/search.cpp       Fri Mar 06 13:15:55 2015 -0600<br>
> +++ b/source/encoder/search.cpp       Mon Mar 09 14:35:20 2015 +0530<br>
> @@ -1928,7 +1928,7 @@<br>
>      bits += getTUBits(ref, m_slice->m_numRefIdx[list]);<br>
><br>
>      MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 1];<br>
> -    int numMvc = interMode.cu.fillMvpCand(part, pu.puAbsPartIdx, list, ref, interMode.amvpCand[list][ref], mvc);<br>
> +    int numMvc = interMode.cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);<br>
><br>
>      int mvpIdx = 0;<br>
>      int merange = m_param->searchRange;<br>
> @@ -2046,34 +2046,36 @@<br>
>          getBlkBits((PartSize)cu.m_partSize[0], slice->isInterP(), puIdx, lastMode, m_listSelBits);<br>
>          bool bDoUnidir = true;<br>
><br>
> +        cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);<br>
> +<br>
>          /* Uni-directional prediction */<br>
>          if (m_param->analysisMode == X265_ANALYSIS_LOAD && bestME[0].ref >= 0)<br>
>          {<br>
> -            for (int l = 0; l < numPredDir; l++)<br>
> +            for (int list = 0; list < numPredDir; list++)<br>
>              {<br>
> -                int ref = bestME[l].ref;<br>
> -                uint32_t bits = m_listSelBits[l] + MVP_IDX_BITS;<br>
> -                bits += getTUBits(ref, numRefIdx[l]);<br>
> -<br>
> -                int numMvc = cu.fillMvpCand(puIdx, pu.puAbsPartIdx, l, ref, interMode.amvpCand[l][ref], mvc);<br>
> +                int ref = bestME[list].ref;<br>
> +                uint32_t bits = m_listSelBits[list] + MVP_IDX_BITS;<br>
> +                bits += getTUBits(ref, numRefIdx[list]);<br>
> +<br>
> +                int numMvc = cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);<br>
><br>
>                  // Pick the best possible MVP from AMVP candidates based on least residual<br>
>                  int mvpIdx = 0;<br>
>                  int merange = m_param->searchRange;<br>
><br>
> -                if (interMode.amvpCand[l][ref][0] != interMode.amvpCand[l][ref][1])<br>
> +                if (interMode.amvpCand[list][ref][0] != interMode.amvpCand[list][ref][1])<br>
>                  {<br>
>                      uint32_t bestCost = MAX_INT;<br>
>                      for (int i = 0; i < AMVP_NUM_CANDS; i++)<br>
>                      {<br>
> -                        MV mvCand = interMode.amvpCand[l][ref][i];<br>
> +                        MV mvCand = interMode.amvpCand[list][ref][i];<br>
><br>
>                          // NOTE: skip mvCand if Y is > merange and -FN>1<br>
>                          if (m_bFrameParallel && (mvCand.y >= (merange + 1) * 4))<br>
>                              continue;<br>
><br>
>                          cu.clipMv(mvCand);<br>
> -                        predInterLumaPixel(pu, tmpPredYuv, *slice->m_refPicList[l][ref]->m_reconPic, mvCand);<br>
> +                        predInterLumaPixel(pu, tmpPredYuv, *slice->m_refPicList[list][ref]->m_reconPic, mvCand);<br>
>                          uint32_t cost = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);<br>
><br>
>                          if (bestCost > cost)<br>
> @@ -2084,26 +2086,26 @@<br>
>                      }<br>
>                  }<br>
><br>
> -                MV mvmin, mvmax, outmv, mvp = interMode.amvpCand[l][ref][mvpIdx];<br>
> +                MV mvmin, mvmax, outmv, mvp = interMode.amvpCand[list][ref][mvpIdx];<br>
><br>
>                  int satdCost;<br>
>                  setSearchRange(cu, mvp, merange, mvmin, mvmax);<br>
> -                satdCost = m_me.motionEstimate(&slice->m_mref[l][ref], mvmin, mvmax, mvp, numMvc, mvc, merange, outmv);<br>
> +                satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, merange, outmv);<br>
><br>
>                  /* Get total cost of partition, but only include MV bit cost once */<br>
>                  bits += m_me.bitcost(outmv);<br>
>                  uint32_t cost = (satdCost - m_me.mvcost(outmv)) + m_rdCost.getCost(bits);<br>
><br>
>                  /* Refine MVP selection, updates: mvp, mvpIdx, bits, cost */<br>
> -                checkBestMVP(interMode.amvpCand[l][ref], outmv, mvp, mvpIdx, bits, cost);<br>
> -<br>
> -                if (cost < bestME[l].cost)<br>
> +                checkBestMVP(interMode.amvpCand[list][ref], outmv, mvp, mvpIdx, bits, cost);<br>
> +<br>
> +                if (cost < bestME[list].cost)<br>
>                  {<br>
> -                    bestME[l].mv = outmv;<br>
> -                    bestME[l].mvp = mvp;<br>
> -                    bestME[l].mvpIdx = mvpIdx;<br>
> -                    bestME[l].cost = cost;<br>
> -                    bestME[l].bits = bits;<br>
> +                    bestME[list].mv = outmv;<br>
> +                    bestME[list].mvp = mvp;<br>
> +                    bestME[list].mvpIdx = mvpIdx;<br>
> +                    bestME[list].cost = cost;<br>
> +                    bestME[list].bits = bits;<br>
>                  }<br>
>              }<br>
>              bDoUnidir = false;<br>
> @@ -2131,32 +2133,32 @@<br>
>          }<br>
>          if (bDoUnidir)<br>
>          {<br>
> -            for (int l = 0; l < numPredDir; l++)<br>
> +            for (int list = 0; list < numPredDir; list++)<br>
>              {<br>
> -                for (int ref = 0; ref < numRefIdx[l]; ref++)<br>
> +                for (int ref = 0; ref < numRefIdx[list]; ref++)<br>
>                  {<br>
> -                    uint32_t bits = m_listSelBits[l] + MVP_IDX_BITS;<br>
> -                    bits += getTUBits(ref, numRefIdx[l]);<br>
> -<br>
> -                    int numMvc = cu.fillMvpCand(puIdx, pu.puAbsPartIdx, l, ref, interMode.amvpCand[l][ref], mvc);<br>
> +                    uint32_t bits = m_listSelBits[list] + MVP_IDX_BITS;<br>
> +                    bits += getTUBits(ref, numRefIdx[list]);<br>
> +<br>
> +                    int numMvc = cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);<br>
><br>
>                      // Pick the best possible MVP from AMVP candidates based on least residual<br>
>                      int mvpIdx = 0;<br>
>                      int merange = m_param->searchRange;<br>
><br>
> -                    if (interMode.amvpCand[l][ref][0] != interMode.amvpCand[l][ref][1])<br>
> +                    if (interMode.amvpCand[list][ref][0] != interMode.amvpCand[list][ref][1])<br>
>                      {<br>
>                          uint32_t bestCost = MAX_INT;<br>
>                          for (int i = 0; i < AMVP_NUM_CANDS; i++)<br>
>                          {<br>
> -                            MV mvCand = interMode.amvpCand[l][ref][i];<br>
> +                            MV mvCand = interMode.amvpCand[list][ref][i];<br>
><br>
>                              // NOTE: skip mvCand if Y is > merange and -FN>1<br>
>                              if (m_bFrameParallel && (mvCand.y >= (merange + 1) * 4))<br>
>                                  continue;<br>
><br>
>                              cu.clipMv(mvCand);<br>
> -                            predInterLumaPixel(pu, tmpPredYuv, *slice->m_refPicList[l][ref]->m_reconPic, mvCand);<br>
> +                            predInterLumaPixel(pu, tmpPredYuv, *slice->m_refPicList[list][ref]->m_reconPic, mvCand);<br>
>                              uint32_t cost = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);<br>
><br>
>                              if (bestCost > cost)<br>
> @@ -2167,26 +2169,26 @@<br>
>                          }<br>
>                      }<br>
><br>
> -                    MV mvmin, mvmax, outmv, mvp = interMode.amvpCand[l][ref][mvpIdx];<br>
> +                    MV mvmin, mvmax, outmv, mvp = interMode.amvpCand[list][ref][mvpIdx];<br>
><br>
>                      setSearchRange(cu, mvp, merange, mvmin, mvmax);<br>
> -                    int satdCost = m_me.motionEstimate(&slice->m_mref[l][ref], mvmin, mvmax, mvp, numMvc, mvc, merange, outmv);<br>
> +                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, merange, outmv);<br>
><br>
>                      /* Get total cost of partition, but only include MV bit cost once */<br>
>                      bits += m_me.bitcost(outmv);<br>
>                      uint32_t cost = (satdCost - m_me.mvcost(outmv)) + m_rdCost.getCost(bits);<br>
><br>
>                      /* Refine MVP selection, updates: mvp, mvpIdx, bits, cost */<br>
> -                    checkBestMVP(interMode.amvpCand[l][ref], outmv, mvp, mvpIdx, bits, cost);<br>
> -<br>
> -                    if (cost < bestME[l].cost)<br>
> +                    checkBestMVP(interMode.amvpCand[list][ref], outmv, mvp, mvpIdx, bits, cost);<br>
> +<br>
> +                    if (cost < bestME[list].cost)<br>
>                      {<br>
> -                        bestME[l].mv = outmv;<br>
> -                        bestME[l].mvp = mvp;<br>
> -                        bestME[l].mvpIdx = mvpIdx;<br>
> -                        bestME[l].ref = ref;<br>
> -                        bestME[l].cost = cost;<br>
> -                        bestME[l].bits = bits;<br>
> +                        bestME[list].mv = outmv;<br>
> +                        bestME[list].mvp = mvp;<br>
> +                        bestME[list].mvpIdx = mvpIdx;<br>
> +                        bestME[list].ref = ref;<br>
> +                        bestME[list].cost = cost;<br>
> +                        bestME[list].bits = bits;<br>
>                      }<br>
>                  }<br>
>              }<br>
> diff -r 043c2418864b -r a000ce541410 source/encoder/search.h<br>
> --- a/source/encoder/search.h Fri Mar 06 13:15:55 2015 -0600<br>
> +++ b/source/encoder/search.h Mon Mar 09 14:35:20 2015 +0530<br>
> @@ -100,6 +100,11 @@<br>
><br>
>      MotionData bestME[MAX_INTER_PARTS][2];<br>
>      MV         amvpCand[2][MAX_NUM_REF][AMVP_NUM_CANDS];<br>
> +//    MV         _amvpCand[2][MAX_NUM_REF][AMVP_NUM_CANDS];<br>
<br>
</div></div>ditto<br>
<span class=""><br>
> +<br>
> +    // Neighbour MVs of the current partition. 5 spatial candidates and the<br>
> +    // temporal candidate.<br>
> +    InterNeighbourMV interNeighbours[6];<br>
><br>
>      uint64_t   rdCost;     // sum of partition (psy) RD costs          (sse(fenc, recon) + lambda2 * bits)<br>
>      uint64_t   sa8dCost;   // sum of partition sa8d distortion costs   (sa8d(fenc, pred) + lambda * bits)<br>
</span>> _______________________________________________<br>
> x265-devel mailing list<br>
> <a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
> <a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Steve Borho<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</font></span></blockquote></div><br></div>