[vlc-devel] commit: NEON accelerated I420/YV12 -> YUYV/UYVY chroma conversion ( Rémi Denis-Courmont )
Måns Rullgård
mans at mansr.com
Sun Sep 20 15:06:53 CEST 2009
git at videolan.org (git version control) writes:
> vlc | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sun Sep 20 11:29:47 2009 +0300| [d4a730bbabc16f80392ae36995865c92e36ac66e] | committer: Rémi Denis-Courmont
>
> NEON accelerated I420/YV12 -> YUYV/UYVY chroma conversion
>
> + .align
> + .global i420_uyvy_neon
> + .type i420_uyvy_neon, %function
> +i420_uyvy_neon:
> + push {r4-r8}
> + add r8, pc, #(indexes+64-.-8)
> + b i420_pack_neon
> +
> + .global i420_yuyv_neon
> + .type i420_yuyv_neon, %function
> +i420_yuyv_neon:
> + push {r4-r8}
> + add r8, pc, #(indexes-.-8)
The "adr r8, indexes" pseudo-instructions is more readable.
> + .hidden i420_pack_neon
> +i420_pack_neon:
> + vld1.u8 {d24-d27}, [r8]!
> + ldmia r1, {r4, r6, r7}
> + vld1.u8 {d28-d31}, [r8]
> + add O2, O1, PITCH, lsl #1
> + add Y2, Y1, PITCH
> +1:
> + mov END_O1, O2
> +2:
> + vld1.u8 {d0-d1}, [Y1,:128]!
> + vld1.u8 {d2}, [U,:64]!
> + vld1.u8 {d3}, [V,:64]!
> + vld1.u8 {d4-d5}, [Y2,:128]!
> + vtbl.u8 d16, {d0-d3}, d24
> + vtbl.u8 d17, {d0-d3}, d25
> + vtbl.u8 d18, {d0-d3}, d26
> + vtbl.u8 d19, {d0-d3}, d27
> + vtbl.u8 d20, {d2-d5}, d28
> + vtbl.u8 d21, {d2-d5}, d29
> + vtbl.u8 d22, {d2-d5}, d30
> + vtbl.u8 d23, {d2-d5}, d31
I suspect a few cascaded vzip instructions would be faster. First
vzip the u an v vectors, then vzip the resulting uv vector with the y
vector.
> + vst1.u8 {d16-d19}, [O1,:128]!
> + vst1.u8 {d20-d23}, [O2,:128]!
> +
> + cmp O1, END_O1
> + bne 2b
> +
> + sub HEIGHT, #2
> + mov O1, O2
> + add O2, PITCH, lsl #1
> + mov Y1, Y2
> + add Y2, PITCH
> +
> + cmp HEIGHT, #0
> + bne 1b
> +
> + pop {r4-r8}
> + bx lr
If you need to push/pop any registers at all, it is faster to include
lr in the list (push {regs,lr}) and pop directly to pc (pop {regs,pc}).
Also remember that r12 is a call-clobbered register so you can use
that freely.
--
Måns Rullgård
mans at mansr.com
More information about the vlc-devel
mailing list