[vlc-devel] [PATCH] add ARM/NEON version of simple channel mixer
Måns Rullgård
mans at mansr.com
Fri Oct 5 10:38:29 CEST 2012
Rémi Denis-Courmont <remi at remlab.net> writes:
> On Thu, 04 Oct 2012 23:29:07 +0100, Måns Rullgård <mans at mansr.com> wrote:
>> Jean-Baptiste Kempf <jb at videolan.org> writes:
>>
>>> On Thu, Oct 04, 2012 at 11:06:42PM +0300, Rémi Denis-Courmont wrote :
>>>> Le jeudi 4 octobre 2012 22:56:08, Jean-Baptiste Kempf a écrit :
>>>> > On Thu, Oct 04, 2012 at 10:46:44PM +0300, Rémi Denis-Courmont wrote :
>>>> > > This seems to lack any sort of unrolling, so the speed will be much
>>>> > > worse
>>>> > > than it could be. If we just want lame optimizations, I would argue
>>>> > > for
>>>> > > intrinsics rather than ASM.
>>>> > >
>>>> > > No hard objections though.
>>>> >
>>>> > Can we commit with a warning about this unrolling, for now?
>>>> > Or is it not good enough ?
>>>>
>>>> I don't know. Is it significantly faster than GCC code?
>>>
>>> It seems, to me, that when I benchmarked the functions it was between 3x
>>> and 8x depending on the mode. But maybe I mis-remember.
>>
>> Almost anything will be much faster than gcc on a cortex-a8 due to the
>> lame scalar floating-point on that core.
>
> Why is GCC not emulating non-SIMD floating point using NEON and dummy
> content for half the d-register, since even that would be much faster than
> VFP?
Because gcc. Also because NEON floating-point isn't always strictly
IEEE754-compliant.
--
Måns Rullgård
mans at mansr.com
More information about the vlc-devel
mailing list