[vlc-devel] [PATCH] arm_neon: Add an optimized routine for deinterleaving chroma
Rémi Denis-Courmont
remi at remlab.net
Mon Oct 7 21:26:10 CEST 2013
Le lundi 7 octobre 2013 14:27:19 Martin Storsjö a écrit :
> The unrolling didn't seem to give any measurable speedup in this
> particular case on an A8.
In this case, if I read the TRM right, simply unrolling makes indeed no
difference. However, if I read the TRM right again, half a cycle per 8 pixels
could be save by doubling the size of the store operations, assuming the U/V
plane is aligned to 16 bytes.
> So what's the verdict on this case then, keep it simple (which also avoids
> overreads or avoids requiring having the interleaved UV-plane aligned to
> 32 bytes) or keep the unrolling?
KISS^H.
--
Rémi Denis-Courmont
http://www.remlab.net/
More information about the vlc-devel
mailing list