[x264-devel] Re: Speed-up method

Loren Merritt lorenm at u.washington.edu
Sun Mar 4 13:16:38 CET 2007


(No experience on the subject, I just read the CUDA manual)

On Sat, 3 Mar 2007, Alex Izvorski wrote:
>
> Does the Geforce do anything integer-based?  Would the data be processed 
> as vertexes or textures?  Is the Geforce Gflops calculation just for the 
> vertex data?

CUDA can access both vertex and texture memory, but that doesn't have any 
effect on the computation unless you're using the built-in bilinear 
interpolation for textures (not too useful to us since it's not 
equivalent to H.264's luma interpolation).

Geforce can do integer math with the same latency as fp math, but only in 
one pipeline and there's no integer madd. So 173 GIPS.

You could run motion estimation in floating-point if it's faster that way. 
But I think it would be faster only if you can find some way to use the 
madd or the bilinear.

> From telomere's post, it seems like the Geforce has considerable 
> multiplication resources (one madd and one mul in the same cycle).  I 
> can't think offhand how to really use that to good effect in motion 
> estimation: perhaps in Hadamard, if it turns out multiply by +1/-1 and 
> add is faster than the combination of add/subtract/permute?

You can't avoid the permute if you're operating on vector registers. You 
have to either transpose, or add together elements of the same reg, both 
of which are permute.
But Geforce doesn't operate on vector registers, the vector data types are 
just provided for programming convenience. The registers are scalar, so 
there is no permute step, so it's just add/sub vs madd/madd, so the 
multiply part wouldn't help Hadamard.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list