[x264-devel] Re: [patch] SSE2 pixel routines

Mon Jul 25 17:46:16 CEST 2005

On Fri, 22 Jul 2005, Alexander Izvorski wrote:

> Here is an early version of SSE2-optimized routines for sad 16x16 and
> 16x8, ssd 16x16 and 16x8, and satd from 16x16 to 8x4 (diff against rev
> 277).  None of these have any special alignment requirements.  I have
> tested that they produce the same results as the mmxext versions, but
> I'd appreciate it if someone else tested them as well.  They are not
> in their final form yet, there are a few places where a few more
> instructions can be shaved off.
>
> -Alex Izvorski
>
> P.S. I've a few questions as well from looking at the original code...
>
> Why is the result of satd divided by two?!  That throws away one bit
> of precision which would have a small but noticeable impact on PSNR.
> (see the "shr     eax,1" in MMX_SUM_MM).

It allows SAD and SATD to use the same lambda table. Now, I'm not sure 
that the difference is exactly a factor of 2, so maybe they should be 
separate.
For that matter, the current lack of precision in lambda should make much 
more difference than 1 bit in satd.

> The original version of HADAMARD4_SUB_BADC uses add-add-subtract, is
> that faster than the equivalent move-add-subtract?  (not on Athlons,
> but maybe on P4?)  The equivalent in my version uses the
> move-add-subtract but can be changed very easily.

add-add-subtract doesn't need any temporary registers.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html