[x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines

Sat Jan 31 23:28:56 CET 2009

Hello folks,

The attached patch adds $SUBJ.

It passes checkasm (I have not tested with a full encode yet)

Here are the speed figures:
hadamard_ac_8x8_c: 180
hadamard_ac_8x8_altivec: 30
hadamard_ac_8x16_c: 378
hadamard_ac_8x16_altivec: 59
hadamard_ac_16x8_c: 378
hadamard_ac_16x8_altivec: 59
hadamard_ac_16x16_c: 753
hadamard_ac_16x16_altivec: 116

I'm quite confident that I can make this implementation faster since
it's quite naive. For instance, it does a full transpose after 'sum4'
is computed, where the x86 SSEx implementation doesn't do a full
transpose, but does something more cleaver, that I didn't decipher
yet.

It may also be possible to interleave some of the hadamard_ac
computations for size > 8x8 since we have more registers to play with
(32) than x86 (8 or 16).

This patch is therefore not meant to reach GIT just yet, but to serve
as worldwide backup and to hopefully get some suggestions too, even if
it may be a bit too early for experts around here to review my coding
horror :-).

Good week-end to you all!

Guillaume
-- 
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadamard_sum4_sum8.0.diff
Type: application/octet-stream
Size: 10208 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090131/507a8b8a/attachment-0001.obj