[x265] [PATCH 07/12] AArch64: Fix sad_x4_neon for X265_DEPTH > 10
Hari Limaye
hari.limaye at arm.com
Thu May 2 21:19:42 UTC 2024
Fix sad_x4_neon overflow for block widths greater than 32 when
HIGH_BIT_DEPTH=1 and X265_DEPTH > 10.
---
source/common/aarch64/pixel-prim.cpp | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/source/common/aarch64/pixel-prim.cpp b/source/common/aarch64/pixel-prim.cpp
index 7164cd99f..f073251d3 100644
--- a/source/common/aarch64/pixel-prim.cpp
+++ b/source/common/aarch64/pixel-prim.cpp
@@ -1069,10 +1069,9 @@ void sad_x4_neon(const pixel *pix1, const pixel *pix2, const pixel *pix3, const
{
/* This is equivalent to adding across each of the sum vectors and then adding
* to result. */
- uint16x8_t a = vpaddq_s16(vsum16_0, vsum16_1);
- uint16x8_t b = vpaddq_s16(vsum16_2, vsum16_3);
- uint16x8_t c = vpaddq_s16(a, b);
- result = vpadalq_s16(result, c);
+ uint32x4_t sum01 = vpaddlq_u16(vpaddq_u16(vsum16_0, vsum16_1));
+ uint32x4_t sum23 = vpaddlq_u16(vpaddq_u16(vsum16_2, vsum16_3));
+ result = vaddq_u32(result, vpaddq_u32(sum01, sum23));
}
#else
--
2.42.1
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
More information about the x265-devel
mailing list