<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>At 2013-09-21 02:35:59,"Jason Garrett-Glaser" <jason@x264.com> wrote:<BR>><BR>>> To implement this change , we need to modify HM code.<BR>>> [MC] we can define the table in asm file, but we have to modify HM. of<BR>>> course, it is easy things<BR>><BR>>You don't have to, of course (you know the code better than I and<BR>>whether or not it's a good idea to change it).<BR></DIV>
<DIV>If we don't modify code, we can't know which coef group they want,</DIV>
<DIV>HEVC have 4 group to qpel, it is different to h264</DIV>
<DIV> </DIV>
<DIV>>>> +<BR>>>> + mov tmp, offset2<BR>>>> + movd sumOffset, tmp<BR>>>> + pshufd sumOffset, sumOffset, 0<BR>>><BR>>> You can movd directly from memory; going through a register is much<BR>>> slower, especially on AMD machines.<BR>>> [MC] are you means, we put constant into memory and load it once?<BR>><BR>>movd sumOffset, offset2<BR></DIV>
<DIV>I look the document before, I think there haven't instruction support</DIV>
<DIV>' movd reg, constant ' on Intel CPU</DIV>
<DIV> </DIV>
<DIV>>> [MC] no way, x264 macro have a bug here, you can remove reduce x2 and check<BR>>> the output, the xmm0 seems Intel limit<BR>><BR>>That makes sense, I don't think the x264 macro was ever designed to<BR>>support non-AVX pblendvb. I don't recommend non-AVX pblendvb anyways<BR>>as it's a lot slower because of the extra register dependency (it's<BR>>like 3 uops or something).<BR></DIV>
<DIV>replace by 'pand + pandn + por' is 3 uops but less dependency,</DIV>
<DIV>in Agner's documents, he said pblendvb is 2-uops, 2-latency and 1-through</DIV>
<DIV>on my Sandy, so I select it.</DIV>
<DIV> </DIV>
<DIV>Of course, this is a bad branch, the code for testbench only.</DIV>
<DIV>in really world, the minimum block is 4x8, the width is 4, movd is enough.</DIV>
<DIV><BR>>Jason<BR>>_______________________________________________<BR>>x265-devel mailing list<BR>>x265-devel@videolan.org<BR>>https://mailman.videolan.org/listinfo/x265-devel<BR></DIV></div>