[vlc-devel] commit: NEON accelerated I420/YV12 -> YUYV/UYVY chroma conversion ( Rémi Denis-Courmont )

Måns Rullgård mans at mansr.com
Sun Sep 20 15:06:53 CEST 2009


git at videolan.org (git version control) writes:

> vlc | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sun Sep 20 11:29:47 2009 +0300| [d4a730bbabc16f80392ae36995865c92e36ac66e] | committer: Rémi Denis-Courmont 
>
> NEON accelerated I420/YV12 -> YUYV/UYVY chroma conversion
>
> +	.align
> +	.global i420_uyvy_neon
> +	.type	i420_uyvy_neon, %function
> +i420_uyvy_neon:
> +	push		{r4-r8}
> +	add		r8,	pc,	#(indexes+64-.-8)
> +	b		i420_pack_neon
> +
> +	.global i420_yuyv_neon
> +	.type	i420_yuyv_neon, %function
> +i420_yuyv_neon:
> +	push		{r4-r8}
> +	add		r8,	pc,	#(indexes-.-8)

The "adr r8, indexes" pseudo-instructions is more readable.

> +	.hidden	i420_pack_neon
> +i420_pack_neon:
> +	vld1.u8		{d24-d27},	[r8]!
> +	ldmia		r1,	{r4, r6, r7}
> +	vld1.u8		{d28-d31},	[r8]
> +	add		O2,	O1,	PITCH, lsl #1
> +	add		Y2,	Y1,	PITCH
> +1:
> +	mov		END_O1,	O2
> +2:
> +	vld1.u8		{d0-d1},	[Y1,:128]!
> +	vld1.u8		{d2},		[U,:64]!
> +	vld1.u8		{d3},		[V,:64]!
> +	vld1.u8		{d4-d5},	[Y2,:128]!
> +	vtbl.u8		d16,	{d0-d3},	d24
> +	vtbl.u8		d17,	{d0-d3},	d25
> +	vtbl.u8		d18,	{d0-d3},	d26
> +	vtbl.u8		d19,	{d0-d3},	d27
> +	vtbl.u8		d20,	{d2-d5},	d28
> +	vtbl.u8		d21,	{d2-d5},	d29
> +	vtbl.u8		d22,	{d2-d5},	d30
> +	vtbl.u8		d23,	{d2-d5},	d31

I suspect a few cascaded vzip instructions would be faster.  First
vzip the u an v vectors, then vzip the resulting uv vector with the y
vector.

> +	vst1.u8		{d16-d19},	[O1,:128]!
> +	vst1.u8		{d20-d23},	[O2,:128]!
> +
> +	cmp		O1,	END_O1
> +	bne		2b
> +
> +	sub		HEIGHT,	#2
> +	mov		O1,	O2
> +	add		O2,	PITCH,	lsl #1
> +	mov		Y1,	Y2
> +	add		Y2,	PITCH
> +
> +	cmp		HEIGHT,	#0
> +	bne		1b
> +
> +	pop		{r4-r8}
> +	bx		lr

If you need to push/pop any registers at all, it is faster to include
lr in the list (push {regs,lr}) and pop directly to pc (pop {regs,pc}).
Also remember that r12 is a call-clobbered register so you can use
that freely.

-- 
Måns Rullgård
mans at mansr.com




More information about the vlc-devel mailing list