[vlc-devel] [PATCH] add ARM/NEON conversions for audio_filter/channel_mixer/simple

Rémi Denis-Courmont remi at remlab.net
Wed Apr 4 14:36:36 CEST 2012


On Wed,  4 Apr 2012 14:16:29 +0200, David Geldreich
<david.geldreich at free.fr> wrote:
> write a ARM/NEON inline assembly version of most of the conversion cases
> of audio_filter/channel_mixer/simple

We have a separate directory for NEON acceleration plugins, aptly named
arm_neon. Please stick the code there in a dedicated plugin.

> inline assembly is in separate functions for clarity and will be inlined
> by the compiler

Yes but inlined assembler is harder to read and it cannot selected at
run-time. The overhead of a function is neglible here. Inlining assembler
makes sense if you want to mix C code, especially for branching. But you
have already implemented branching in assembler anyway. So you might as
well use a dedicated assembler source file, then.

> For example, 5.x->2 conversion gets a 8x speedup on iPad1 and 3x on

It looks like your code was hand-scheduled, was it not? What was the
target CPU? A8? Do you have any remaining stall that could be elimited with
unrolling? If so, it would be nice to mention them in comments for future
programmers. Otherwise, great.

> I could provide a test program that shows that these routines :
> - give the same result (modulo epsilon) as the original one

Do you mean some (negligible) maths "errors" are induced due to floating

> - work for any alignement of src/dst
> - work for any buffer size

Looks very nice overall, though I have not manually tested it.

Rémi Denis-Courmont
Sent from my collocated server

More information about the vlc-devel mailing list