[x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

Hari Limaye hari.limaye at arm.com
Wed May 29 11:24:16 UTC 2024

Previous message (by thread): [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations
Next message (by thread): [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Chen,

Thank you for clarifying.

>From the Arm CPU Software Optimisation Guides, LD1R requires an extra micro-op for the broadcast compared to the regular load (LDR). Benchmarking shows that using LD1R in the sad functions of width 4 is ~20% slower than using the LDR, ADD sequence.

Many thanks,

Hari

Previous message (by thread): [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations
Next message (by thread): [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the x265-devel mailing list