[x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

Hari Limaye hari.limaye at arm.com
Wed May 29 11:24:16 UTC 2024


Hi Chen,

Thank you for clarifying.

>From the Arm CPU Software Optimisation Guides, LD1R requires an extra micro-op for the broadcast compared to the regular load (LDR). Benchmarking shows that using LD1R in the sad functions of width 4 is ~20% slower than using the LDR, ADD sequence.

Many thanks,

Hari


More information about the x265-devel mailing list