<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>+ movhlps m1, m0<BR>>+ paddd m0, m1<BR>>+ movd eax, m0<BR>>+ RET<BR>Seems good,</DIV>
<DIV>only one problem, I know movhlps is from x264's code, but in Agner's documents, movhlps is slow than pshufd, I am not sure which is better here</DIV></div>