[x264-devel] question in pixel_avg2_w20_sse2

Mon Jul 28 21:44:50 CEST 2014

On Tue, 29 Jul 2014 03:37:35 +0800 (CST), chen wrote:
> At 2014-07-29 03:20:42,BugMaster BugMaster at narod.ru> wrote:
>>On Tue, 29 Jul 2014 02:40:00 +0800 (CST), chen wrote:
>>> In pixel_avg2_w20_sse2, it mixed use XMM0-XMM4 and MM4-MM5, so MM4-MM5 was not save and restore.
>>> I check ABI document, it just said Microsoft Compiler didn't use MM0-MM7
>>> Is it a bug?
>>>  
>>> btw: I know the MM_ is double faster then XMM_ in old cpu, but in
>>> latest CPU, it is same speed or slower.
>>>  
>>> Min
>>>   
>>Hi. I not fully understand what was your real question and what you
>>see as bug here. Yes, this function use mix of SSE2/MMX
>>instructions/registers because we don't need full length XMM register
>>for this width (16+4) and we need unaligned memory access here. As for
>>calling ABI all MMX regs (mm0-mm7) are volatile and must be considered
>>destroyed on function calls.
>>>

> in here, we didn't call EMMS before return, after this function,
> the ST(*) or MM* in non-determeric status.

In x264 we use practice to call emms *before* using float point
arithmetics in C code and not add emms to every function that use MMX.