[x264-devel] Re: [patch] SSE2 pixel routines
Loren Merritt
lorenm at u.washington.edu
Mon Jul 25 17:46:16 CEST 2005
On Fri, 22 Jul 2005, Alexander Izvorski wrote:
> Here is an early version of SSE2-optimized routines for sad 16x16 and
> 16x8, ssd 16x16 and 16x8, and satd from 16x16 to 8x4 (diff against rev
> 277). None of these have any special alignment requirements. I have
> tested that they produce the same results as the mmxext versions, but
> I'd appreciate it if someone else tested them as well. They are not
> in their final form yet, there are a few places where a few more
> instructions can be shaved off.
>
> -Alex Izvorski
>
> P.S. I've a few questions as well from looking at the original code...
>
> Why is the result of satd divided by two?! That throws away one bit
> of precision which would have a small but noticeable impact on PSNR.
> (see the "shr eax,1" in MMX_SUM_MM).
It allows SAD and SATD to use the same lambda table. Now, I'm not sure
that the difference is exactly a factor of 2, so maybe they should be
separate.
For that matter, the current lack of precision in lambda should make much
more difference than 1 bit in satd.
> The original version of HADAMARD4_SUB_BADC uses add-add-subtract, is
> that faster than the equivalent move-add-subtract? (not on Athlons,
> but maybe on P4?) The equivalent in my version uses the
> move-add-subtract but can be changed very easily.
add-add-subtract doesn't need any temporary registers.
--Loren Merritt
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list