<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div id="spnEditorContent"><p style="margin: 0;">Hi Hari,</p><p style="margin: 0;"><br></p><p style="margin: 0;">Thank for the new patches, I have some comments,</p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><p style="margin: 0;">What's reason rename these parameters? in the AArch64 ABI, only first 8 parameters passthrough SIMD registers, compiler will take care these inout parameters.</p><pre style="width: 1298.64px; word-break: break-word !important;">source/common/aarch64/<b>pixel-prim.cpp</b>
-static inline void _sa8d_8x8_neon_end(int16x8_t &v0, int16x8_t &v1, int16x8_t v2, int16x8_t v3,
-                                      int16x8_t v20, int16x8_t v21, int16x8_t v22, int16x8_t v23)
+static inline void _sa8d_8x8_neon_end(int16x8_t v0, int16x8_t v1, int16x8_t v2,
+                                      int16x8_t v3, int16x8_t v20,
+                                      int16x8_t v21, int16x8_t v22,
+                                      int16x8_t v23, sa8d_out_type &out0,
+                                      sa8d_out_type &out1)
</pre><div><br></div><div><pre style="width: 1298.64px; word-break: break-word !important;">source/common/aarch64/<b>filter-prim.cpp</b>
*) if you want to improve these interpolate functions, how about also improve algorithm?
For example, in the interp_vert_pp_neon, if we swap order of loop_row and loop_col, we can reuse most of input[*] at next row</pre><pre style="width: 1298.64px; word-break: break-word !important;">*) Some of line need not change, for example,
-            vsum = vmlal_lane_s16(vsum, vget_low_u16(input[0]), vget_low_s16(vc3), 0);
+            vsum = vmlal_lane_s16(vsum, vget_low_s16(input[0]),
+                                  vget_low_s16(vc3), 0);

</pre></div><p style="margin: 0;">Regards,</p><p style="margin: 0;">Chen</p><p style="margin: 0;"><br></p></div><pre>At 2024-08-13 23:18:41, "Hari Limaye" <hari.limaye@arm.com> wrote:
>This patch series performs some refactoring to AArch64 intrinsics code
>to use correct vector types and conversions for Neon vector operations,
>in order to enable building with -flax-vector-conversions=none.
>
>These patches are intended to be primarily refactoring only and are not
>intended to have any performance impact.
>
>The changes are based on the SAD series, as [PATCH 18/18] here makes
>changes to source/CMakeLists.txt which depends on CMake refactoring in:
>    https://mailman.videolan.org/pipermail/x265-devel/2024-July/013740.html
>
>Many thanks,
>Hari
>
>Hari Limaye (18):
>  AArch64: Use proper load/store intrinsics in pixel primitives
>  AArch64: Refactor output variables in Neon sa8d helper
>  AArch64: Use transpose helpers in pixel-prim.cpp
>  AArch64: Refactor types and conversions in pixel-prim.cpp
>  AArch64: Add missing include in arm64-utils.h
>  AArch64: Use proper load/store intrinsics in arm64-utils.cpp
>  AArch64: Refactor types and conversions in arm64-utils.cpp
>  AArch64: Optimise shifts in filter-prim.cpp
>  AArch64: Use proper load/store intrinsics in filter-prim.cpp
>  AArch64: Refactor types and conversions in filter-prim.cpp
>  AArch64: Use proper load/store intrinsics in intrapred-prim.cpp
>  AArch64: Refactor types and conversions in intrapred-prim.cpp
>  AArch64: Refactor narrowing in loopfilter-prim.cpp
>  AArch64: Use proper load/store intrinsics in loopfilter-prim.cpp
>  AArch64: Refactor types and conversions in loopfilter-prim.cpp
>  AArch64: Use proper load/store intrinsics in dct-prim.cpp
>  AArch64: Refactor types and conversions in dct-prim.cpp
>  AArch64: Build with -flax-vector-conversions=none
>
> source/CMakeLists.txt                     |   8 +-
> source/common/CMakeLists.txt              |   2 +-
> source/common/aarch64/arm64-utils.cpp     | 478 ++++++-----
> source/common/aarch64/arm64-utils.h       |   1 +
> source/common/aarch64/dct-prim.cpp        | 132 +--
> source/common/aarch64/filter-prim.cpp     | 168 ++--
> source/common/aarch64/intrapred-prim.cpp  |  44 +-
> source/common/aarch64/loopfilter-prim.cpp | 113 +--
> source/common/aarch64/mem-neon.h          |  59 ++
> source/common/aarch64/pixel-prim.cpp      | 992 +++++++++++-----------
> 10 files changed, 1098 insertions(+), 899 deletions(-)
> create mode 100644 source/common/aarch64/mem-neon.h
>
>-- 
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel@videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
</pre></div>