[vlc-devel] [PATCH] arm_neon: Add an optimized routine for deinterleaving chroma

Rémi Denis-Courmont remi at remlab.net
Mon Oct 7 21:26:10 CEST 2013


Le lundi 7 octobre 2013 14:27:19 Martin Storsjö a écrit :
> The unrolling didn't seem to give any measurable speedup in this
> particular case on an A8.

In this case, if I read the TRM right, simply unrolling makes indeed no 
difference. However, if I read the TRM right again, half a cycle per 8 pixels 
could be save by doubling the size of the store operations, assuming the U/V 
plane is aligned to 16 bytes.

> So what's the verdict on this case then, keep it simple (which also avoids
> overreads or avoids requiring having the interleaved UV-plane aligned to
> 32 bytes) or keep the unrolling?

KISS^H.

-- 
Rémi Denis-Courmont
http://www.remlab.net/




More information about the vlc-devel mailing list