[x264-devel] PATCH: frame_init_lowres_core_altivec

Guillaume POIRIER gpoirier at mplayerhq.hu
Sat Jun 20 21:40:57 CEST 2009


Hello everyone!

I'm very sorry for the late response to this old mail. It didn't go
unnoticed, I just had to find enough time to carefully review your patch.

On Sun, May 31, 2009 at 12:41 AM, David Wolstencroft <
wolstencroft at alum.rpi.edu> wrote:
>
> Please find the attached patch for frame_init_lowres_core_altivec
>

Youpi! It's indeed the current most CPU-extensive non-altivec-ed routine.


I would appreciate any review you guys can give,


At first I thought I could shrink the linecount of the code that processes
the "end" part with some macros, but it'd only make the core less readable.

Well, really, I don't see anything that I could do to improve your code...


as well as anyone who can test the compile on a linux distribution.
>

Works on Gentoo with GCC 4.3 and 4.1.



> Speed gains as follows:
>
> ticks:
> lowres_init_c: 8374
> lowres_init_altivec: 375
>
> Below with a patched checkasm with CHUD for cycle accurate counts:
> lowres_init_c: 286981
> lowres_init_altivec: 12822
>
> Using the cycle accurate counts, the altivec version shows a 22.38x speed
> increase over plain C
>

Here are the figures on G5:
lowres_init_c: 3310
lowres_init_altivec: 132

That's a 25x speed increase over sequential code.

I'll apply your code shortly

Thanks a million, and sorry for the very late reply.

Guillaume
-- 
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.

George Carlin<http://www.brainyquote.com/quotes/authors/g/george_carlin.html>
- "Electricity is really just organized lightning."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x264-devel/attachments/20090620/c5f8f81b/attachment.htm>


More information about the x264-devel mailing list