[x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines
gpoirier at mplayerhq.hu
Sat Jan 31 23:28:56 CET 2009
The attached patch adds $SUBJ.
It passes checkasm (I have not tested with a full encode yet)
Here are the speed figures:
I'm quite confident that I can make this implementation faster since
it's quite naive. For instance, it does a full transpose after 'sum4'
is computed, where the x86 SSEx implementation doesn't do a full
transpose, but does something more cleaver, that I didn't decipher
It may also be possible to interleave some of the hadamard_ac
computations for size > 8x8 since we have more registers to play with
(32) than x86 (8 or 16).
This patch is therefore not meant to reach GIT just yet, but to serve
as worldwide backup and to hopefully get some suggestions too, even if
it may be a bit too early for experts around here to review my coding
Good week-end to you all!
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10208 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090131/507a8b8a/attachment-0001.obj
More information about the x264-devel