[vlc-devel] [PATCH] add ARM/NEON conversions for audio_filter/channel_mixer/simple
Rémi Denis-Courmont
remi at remlab.net
Wed Apr 4 14:36:36 CEST 2012
Hello,
On Wed, 4 Apr 2012 14:16:29 +0200, David Geldreich
<david.geldreich at free.fr> wrote:
> write a ARM/NEON inline assembly version of most of the conversion cases
> of audio_filter/channel_mixer/simple
We have a separate directory for NEON acceleration plugins, aptly named
arm_neon. Please stick the code there in a dedicated plugin.
> inline assembly is in separate functions for clarity and will be inlined
> by the compiler
Yes but inlined assembler is harder to read and it cannot selected at
run-time. The overhead of a function is neglible here. Inlining assembler
makes sense if you want to mix C code, especially for branching. But you
have already implemented branching in assembler anyway. So you might as
well use a dedicated assembler source file, then.
> For example, 5.x->2 conversion gets a 8x speedup on iPad1 and 3x on
iPad2
It looks like your code was hand-scheduled, was it not? What was the
target CPU? A8? Do you have any remaining stall that could be elimited with
unrolling? If so, it would be nice to mention them in comments for future
programmers. Otherwise, great.
> I could provide a test program that shows that these routines :
> - give the same result (modulo epsilon) as the original one
Do you mean some (negligible) maths "errors" are induced due to floating
point?
> - work for any alignement of src/dst
> - work for any buffer size
Looks very nice overall, though I have not manually tested it.
--
Rémi Denis-Courmont
Sent from my collocated server
More information about the vlc-devel
mailing list