[vlc-devel] commit: NEON float to fixed point vectorized conversion ( Rémi Denis-Courmont )
Måns Rullgård
mans at mansr.com
Sun Sep 6 20:02:37 CEST 2009
git at videolan.org (git version control) writes:
> vlc | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sat Sep 5 17:14:09 2009 +0300| [7b51769579f4b5a83641c5e93e957f902467f71e] | committer: Rémi Denis-Courmont
>
> NEON float to fixed point vectorized conversion
>
> +/**
> + * Half-precision floating point to signed fixed point conversion.
> + */
I think you mean single-precision. Half-precision is a rather
uncommon 16-bit floating point format.
> + while (inp != endp)
> + asm volatile (
> + "vld4.f32 {q0-q1}, [%[inp]]!\n"
> + "vcvt.s32.f32 q2, q0, #28\n"
> + "vcvt.s32.f32 q3, q1, #28\n"
> + "vst4.s32 {q2-q3}, [%[outp]]!\n"
> + : [outp] "+r" (outp), [inp] "+r" (inp)
> + :
> + : "q0", "q1", "q2", "q3", "memory");
This is very inefficient for a couple of reasons:
- VLD1 is faster and works just as well here.
- The VST4 instruction will stall four cycles waiting for the result
of VCVT.
Simply switching to VLD1/VST1 will save one cycle in each of these,
and will also stall for one cycle less, saving a total of three cycles
per iteration. More could be saved by unrolling the loop a few times.
FFmpeg has some highly optimised NEON code for float to int16
conversion. Maybe that could be adapted and used here.
--
Måns Rullgård
mans at mansr.com
More information about the vlc-devel
mailing list