<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>@@ -5105,8 +5108,9 @@<BR>> pmaddwd m5, [r6 + 3 * 16]<BR>> paddd m1, m5 ;m1=[1+2+3+4+5+6+7+8] Row2 end<BR>> psrad m1, 6<BR>>-<BR>>- packssdw m0, m1<BR>>+ pand m1, m7<BR>>+<BR>>+ packusdw m0, m1<BR>> <BR>> movlps [r2], m0<BR>> movhps [r2 + r3], m0<BR></DIV>
<DIV>PAND + PACKUSDW may avoid overflow problem, but it is wrong way here</DIV>
<DIV>as you said, you got a result value 0x8D84, it is overflow on 16bits, so we need to find really reason,</DIV>
<DIV>I check the HM code, it use Short, so I suggest you catch input data and put into HM to check HM's output.</DIV>
<DIV> </DIV></div>