[vlc-devel] [PATCH] add ARM/NEON conversions for audio_filter/channel_mixer/simple
David Geldreich
david.geldreich at free.fr
Wed Apr 4 15:03:45 CEST 2012
Hi,
Le 4 avr. 2012 à 14:36, Rémi Denis-Courmont a écrit :
> We have a separate directory for NEON acceleration plugins, aptly named
> arm_neon. Please stick the code there in a dedicated plugin.
As j-b mentioned, this is a tradeoff between duplicating the code or "polluting" the original file.
Also, I am not expert with the VLC code architecture so all advices are welcome.
>> inline assembly is in separate functions for clarity and will be inlined
>> by the compiler
>
> Yes but inlined assembler is harder to read and it cannot selected at
> run-time. The overhead of a function is neglible here. Inlining assembler
> makes sense if you want to mix C code, especially for branching. But you
> have already implemented branching in assembler anyway. So you might as
> well use a dedicated assembler source file, then.
There is some branching in the DoWork function to switch between all the input/output combinations
>> For example, 5.x->2 conversion gets a 8x speedup on iPad1 and 3x on
> iPad2
>
> It looks like your code was hand-scheduled, was it not? What was the
> target CPU? A8? Do you have any remaining stall that could be elimited with
> unrolling? If so, it would be nice to mention them in comments for future
> programmers. Otherwise, great.
Yes, I hand scheduled some routines with help of "Cortex-A8 cycle computation" : http://pulsar.webshaker.net/ccc/
>
>> I could provide a test program that shows that these routines :
>> - give the same result (modulo epsilon) as the original one
>
> Do you mean some (negligible) maths "errors" are induced due to floating
> point?
Yes, that's it, the neon results differs by 10e-6 or less.
Regards.
More information about the vlc-devel
mailing list