[x265] [arm64] port LUMA_VPP_4xN
Pop, Sebastian
spop at amazon.com
Fri Jul 2 18:43:07 UTC 2021
Hi,
thanks for your review.
> +#ifdef __MACH__
> +# define MACH
> +#else
> +# define MACH #
> This is not good idea to bypass .const_data
MACH uses ".const_data" directive, which is invalid for ELF.
For ELF the directive is ".rodata":
> ELF .section .rodata
> MACH .const_data
> + ushll v0.8h, v0.8b, #0
> ...
> + mul v16.8h, v0.8h, v24.8h
> Why not MULL?
That would not work for the rest of the computation.
Part of the data in v0 gets used in the next computation,
and then I would have to split mla into a mull + add.
> + orr v0.16b, v1.16b, v1.16b
> This is equal to MOV, I guess compiler will replace to right instruction on ARM64
I replaced orr with mov instructions.
> + // sum row[0-7]
> + dup v18.2d, v16.d[1]
> + dup v19.2d, v17.d[1]
> + add v16.4h, v16.4h, v18.4h
> + add v17.4h, v17.4h, v19.4h
> + trn1 v16.2d, v16.2d, v17.2d
> How about ADDP?
I replaced the above 5 instructions with the following 3 and the performance improved.
trn1 v20.2d, v16.2d, v17.2d
trn2 v21.2d, v16.2d, v17.2d
add v16.8h, v20.8h, v21.8h
Please see attached the amended patch.
Thanks,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210702/430c1ace/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-luma_vpp.patch
Type: application/octet-stream
Size: 19195 bytes
Desc: 0001-arm64-port-luma_vpp.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210702/430c1ace/attachment-0001.obj>
More information about the x265-devel
mailing list