[vlc-devel] commit: NEON float to fixed point vectorized conversion ( Rémi Denis-Courmont )

Sun Sep 6 20:02:37 CEST 2009

git at videolan.org (git version control) writes:

> vlc | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sat Sep  5 17:14:09 2009 +0300| [7b51769579f4b5a83641c5e93e957f902467f71e] | committer: Rémi Denis-Courmont 
>
> NEON float to fixed point vectorized conversion
>
> +/**
> + * Half-precision floating point to signed fixed point conversion.
> + */

I think you mean single-precision.  Half-precision is a rather
uncommon 16-bit floating point format.

> +    while (inp != endp)
> +        asm volatile (
> +            "vld4.f32 {q0-q1}, [%[inp]]!\n"
> +            "vcvt.s32.f32 q2, q0, #28\n"
> +            "vcvt.s32.f32 q3, q1, #28\n"
> +            "vst4.s32 {q2-q3}, [%[outp]]!\n"
> +            : [outp] "+r" (outp), [inp] "+r" (inp)
> +            :
> +            : "q0", "q1", "q2", "q3", "memory");

This is very inefficient for a couple of reasons:

- VLD1 is faster and works just as well here.
- The VST4 instruction will stall four cycles waiting for the result
  of VCVT.

Simply switching to VLD1/VST1 will save one cycle in each of these,
and will also stall for one cycle less, saving a total of three cycles
per iteration.  More could be saved by unrolling the loop a few times.

FFmpeg has some highly optimised NEON code for float to int16
conversion.  Maybe that could be adapted and used here.

-- 
Måns Rullgård
mans at mansr.com