[x265] [PATCH 00/18] AArch64: Enable building with -flax-vector-conversions=none

Thu Aug 15 04:05:36 UTC 2024

Hi Hari,

Thank for the new patches, I have some comments,

What's reason rename these parameters? in the AArch64 ABI, only first 8 parameters passthrough SIMD registers, compiler will take care these inout parameters.

source/common/aarch64/pixel-prim.cpp
-static inline void _sa8d_8x8_neon_end(int16x8_t &v0, int16x8_t &v1, int16x8_t v2, int16x8_t v3,
-                                      int16x8_t v20, int16x8_t v21, int16x8_t v22, int16x8_t v23)
+static inline void _sa8d_8x8_neon_end(int16x8_t v0, int16x8_t v1, int16x8_t v2,
+                                      int16x8_t v3, int16x8_t v20,
+                                      int16x8_t v21, int16x8_t v22,
+                                      int16x8_t v23, sa8d_out_type &out0,
+                                      sa8d_out_type &out1)

source/common/aarch64/filter-prim.cpp
*) if you want to improve these interpolate functions, how about also improve algorithm?
For example, in the interp_vert_pp_neon, if we swap order of loop_row and loop_col, we can reuse most of input[*] at next row
*) Some of line need not change, for example,
-            vsum = vmlal_lane_s16(vsum, vget_low_u16(input[0]), vget_low_s16(vc3), 0);
+            vsum = vmlal_lane_s16(vsum, vget_low_s16(input[0]),
+                                  vget_low_s16(vc3), 0);

Regards,

Chen

At 2024-08-13 23:18:41, "Hari Limaye" <hari.limaye at arm.com> wrote:
>This patch series performs some refactoring to AArch64 intrinsics code
>to use correct vector types and conversions for Neon vector operations,
>in order to enable building with -flax-vector-conversions=none.
>
>These patches are intended to be primarily refactoring only and are not
>intended to have any performance impact.
>
>The changes are based on the SAD series, as [PATCH 18/18] here makes
>changes to source/CMakeLists.txt which depends on CMake refactoring in:
>    https://mailman.videolan.org/pipermail/x265-devel/2024-July/013740.html
>
>Many thanks,
>Hari
>
>Hari Limaye (18):
>  AArch64: Use proper load/store intrinsics in pixel primitives
>  AArch64: Refactor output variables in Neon sa8d helper
>  AArch64: Use transpose helpers in pixel-prim.cpp
>  AArch64: Refactor types and conversions in pixel-prim.cpp
>  AArch64: Add missing include in arm64-utils.h
>  AArch64: Use proper load/store intrinsics in arm64-utils.cpp
>  AArch64: Refactor types and conversions in arm64-utils.cpp
>  AArch64: Optimise shifts in filter-prim.cpp
>  AArch64: Use proper load/store intrinsics in filter-prim.cpp
>  AArch64: Refactor types and conversions in filter-prim.cpp
>  AArch64: Use proper load/store intrinsics in intrapred-prim.cpp
>  AArch64: Refactor types and conversions in intrapred-prim.cpp
>  AArch64: Refactor narrowing in loopfilter-prim.cpp
>  AArch64: Use proper load/store intrinsics in loopfilter-prim.cpp
>  AArch64: Refactor types and conversions in loopfilter-prim.cpp
>  AArch64: Use proper load/store intrinsics in dct-prim.cpp
>  AArch64: Refactor types and conversions in dct-prim.cpp
>  AArch64: Build with -flax-vector-conversions=none
>
> source/CMakeLists.txt                     |   8 +-
> source/common/CMakeLists.txt              |   2 +-
> source/common/aarch64/arm64-utils.cpp     | 478 ++++++-----
> source/common/aarch64/arm64-utils.h       |   1 +
> source/common/aarch64/dct-prim.cpp        | 132 +--
> source/common/aarch64/filter-prim.cpp     | 168 ++--
> source/common/aarch64/intrapred-prim.cpp  |  44 +-
> source/common/aarch64/loopfilter-prim.cpp | 113 +--
> source/common/aarch64/mem-neon.h          |  59 ++
> source/common/aarch64/pixel-prim.cpp      | 992 +++++++++++-----------
> 10 files changed, 1098 insertions(+), 899 deletions(-)
> create mode 100644 source/common/aarch64/mem-neon.h
>
>-- 
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240815/c48c905c/attachment.htm>