[vlc-devel] [PATCH] add ARM/NEON conversions for audio_filter/channel_mixer/simple
david.geldreich at free.fr
Tue May 29 21:05:03 CEST 2012
Thanks Sebastien for the integration of "my" code.
And Remi is right, there are room for improvement in these assembly routines.
To get a better scheduling, minimize bubbles, ... I think we need to unroll these loops. Without unrolling, there is not enough operations to reorder them and "optimize" the scheduling.
Other possible improvements are : use of VMLA instruction, better load with aligned memory, ...
However, with this naive assembly rewrite we already got some speedup.
To put more work on these routines, we have to measure their real impact on the total CPU time when decoding a media. If it only represents 0.1% of the CPU time spent in decoding, I think our time will be better spent optimizing other parts of VLC.
Le 20 mai 2012 à 19:02, XilasZ a écrit :
> I don't understand why you push q0-q2. If I recall correctly, only q4-q7 are
> Ah you are right, i didn't remember that, i'll remove all vpop/vpush then :p
> Also, this badly lacks scheduling of ARM vs NEON and NEON load/store vs NEON
> arithmetic, but I suppose you know that.
> Apparently i do not.
> I didn't changed the asm code, it's david's code.
> What do you mean exactly ?
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
More information about the vlc-devel