[x264-devel] Re: Compilation problem with x264 on Dual Opteron setup (SSE3)

Guillaume POIRIER poirierg at gmail.com
Mon May 1 11:48:10 CEST 2006


Hi,

On 5/1/06, Mauricio Alvarez <alvarez at ac.upc.edu> wrote:
> Loren Merritt wrote:
> > No, despite what intel would have you believe, SSE3 provides nothing at
> > all for video codecs. We tried it, at there was no measurable speed
> > difference.
>
> And , what about the LDDQU (load quad word unaligned) instruction?
>
> Can be this instruction useful in the motion estimation routines in
> which there are access to macroblocks/sub-macroblock with misaligned
> addresses?

I made a patch that made use of this instruction, but since I don't
have any machine that supports SSE3, I wasn't able to test it.

It's available here, if you want:
http://tuxrip.free.fr/transperl/MPlayer/SSE3_lddqu.2.diff

I don't know if it still applies cleanly though.

The main problem of that patch is that it unconditionally replaces all
movdqu with lddqu, which isn't very smart. Intel optimization guide
does state quite clearly that it's not how it should be done.
What should be done is: instrument the code in a way that can tell you
what are the loads that are always badly unaligned, and use lddqu only
in these cases (loads that are sometimes aligned, sometimes not do not
benefit from using lddqu).

In any cases, I doubt lddqu can bring any benefit to K8 cores. It's
probably only useful for prescott cores.

Guillaume
--
I am disillusioned enough to know that no man's opinion on any subject
is worth a damn unless backed up with enough genuine information to
make him really know what he's talking about.

-- H. P. Lovecraft (about the flamewars on FFmpeg and MPlayer-dev mailing lists)
http://www.brainyquote.com/quotes/quotes/h/hplovecr278144.html

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list