[x264-devel] PATCH: frame_init_lowres_core_altivec
Guillaume POIRIER
gpoirier at mplayerhq.hu
Sat Jun 20 21:40:57 CEST 2009
Hello everyone!
I'm very sorry for the late response to this old mail. It didn't go
unnoticed, I just had to find enough time to carefully review your patch.
On Sun, May 31, 2009 at 12:41 AM, David Wolstencroft <
wolstencroft at alum.rpi.edu> wrote:
>
> Please find the attached patch for frame_init_lowres_core_altivec
>
Youpi! It's indeed the current most CPU-extensive non-altivec-ed routine.
I would appreciate any review you guys can give,
At first I thought I could shrink the linecount of the code that processes
the "end" part with some macros, but it'd only make the core less readable.
Well, really, I don't see anything that I could do to improve your code...
as well as anyone who can test the compile on a linux distribution.
>
Works on Gentoo with GCC 4.3 and 4.1.
> Speed gains as follows:
>
> ticks:
> lowres_init_c: 8374
> lowres_init_altivec: 375
>
> Below with a patched checkasm with CHUD for cycle accurate counts:
> lowres_init_c: 286981
> lowres_init_altivec: 12822
>
> Using the cycle accurate counts, the altivec version shows a 22.38x speed
> increase over plain C
>
Here are the figures on G5:
lowres_init_c: 3310
lowres_init_altivec: 132
That's a 25x speed increase over sequential code.
I'll apply your code shortly
Thanks a million, and sorry for the very late reply.
Guillaume
--
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
George Carlin<http://www.brainyquote.com/quotes/authors/g/george_carlin.html>
- "Electricity is really just organized lightning."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x264-devel/attachments/20090620/c5f8f81b/attachment.htm>
More information about the x264-devel
mailing list