<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>-        T00 = _mm_unpacklo_epi16(T00, _mm_setzero_si128());<BR>>+        __m128i sign = _mm_srai_epi16(T00, 15);<BR>>+        T00 = _mm_unpacklo_epi16(T00, sign);<BR>>         T01 = _mm_unpacklo_epi8(T01, _mm_setzero_si128());<BR></DIV>
<DIV>I guess pmovsxwd is faster</DIV>
<DIV> </DIV></div>