[x264-devel] SSE4 Optimizations

Loren Merritt lorenm at u.washington.edu
Sun Jan 13 00:34:37 CET 2008


On Thu, 20 Dec 2007, Mike Kazmier wrote:

> Has anyone looked at or considered optimizations for SSE4 as described 
> here: http://softwarecommunity.intel.com/articles/eng/1246.htm
>
> It looks quite promising.  I don't personally have the skills to do 
> this, but can supply the hardware if anyone wants to take a stab at it.

There are 3 possible approaches:
use MPSADBW in full search, as proposed in the above webpage,
use MPSADBW in a new iterative search,
use PHMINPOSUW in the existing iterative searches.


Full search can be further divided into:
PSADBW ESA
MPSADBW ESA
PSADBW SEA (current x264)
MPSADBW SEA
("ESA" = exhaustive, "SEA" = successive elimination. They have identical 
results, but SEA is algorithmically faster.)

MPSADBW ESA is definitely faster than PSADBW ESA, but both are slower than 
PSADBW SEA. I previously thought MPSADBW SEA was impossible, but after 
some experimentation I revise that. MPSADBW evaluates 8 consecutive mvs, 
and successive elimination in general leaves sparse mvs, but in practice 
they're sufficiently clumped that MPSADBW should work with only a moderate 
amount of wasted computation. Still, the SAD part of the search accounts 
for less cpu-time than the successive elimination part, so don't expect 
anything major here.


New iterative search: I'm not exactly sure what form it should take. 
Perhaps a 8x5 rectangle (with only 3 rows, the other 2 being searched 
after the iterative part halts.) This should have convergence properties 
similar to Hex search, but whether it ends up faster will be determined by 
exactly how fast MPSADBW is compared to independent SADs.


PHMINPOSUW gets ugly, because it doesn't replace a DSP function, it 
replaces code that's optimally written in inline C when not using SSE4.

--Loren Merritt



More information about the x264-devel mailing list