[vlc-devel] [PATCH] add ARM/NEON version of simple channel mixer

Fri Oct 5 09:05:20 CEST 2012

On Thu, 04 Oct 2012 23:29:07 +0100, Måns Rullgård <mans at mansr.com> wrote:
> Jean-Baptiste Kempf <jb at videolan.org> writes:
> 
>> On Thu, Oct 04, 2012 at 11:06:42PM +0300, Rémi Denis-Courmont wrote :
>>> Le jeudi 4 octobre 2012 22:56:08, Jean-Baptiste Kempf a écrit :
>>> > On Thu, Oct 04, 2012 at 10:46:44PM +0300, Rémi Denis-Courmont wrote
:
>>> > > This seems to lack any sort of unrolling, so the speed will be
much
>>> > > worse
>>> > > than it could be. If we just want lame optimizations, I would
argue
>>> > > for
>>> > > intrinsics rather than ASM.
>>> > > 
>>> > > No hard objections though.
>>> > 
>>> > Can we commit with a warning about this unrolling, for now?
>>> > Or is it not good enough ?
>>> 
>>> I don't know. Is it significantly faster than GCC code?
>>
>> It seems, to me, that when I benchmarked the functions it was between
3x
>> and 8x depending on the mode. But maybe I mis-remember.
> 
> Almost anything will be much faster than gcc on a cortex-a8 due to the
> lame scalar floating-point on that core.

Why is GCC not emulating non-SIMD floating point using NEON and dummy
content for half the d-register, since even that would be much faster than
VFP?

-- 
Rémi Denis-Courmont
Sent from my collocated server