[x265] [PATCH 00/18] AArch64: Enable building with -flax-vector-conversions=none
chen
chenm003 at 163.com
Thu Aug 15 04:05:36 UTC 2024
Hi Hari,
Thank for the new patches, I have some comments,
What's reason rename these parameters? in the AArch64 ABI, only first 8 parameters passthrough SIMD registers, compiler will take care these inout parameters.
source/common/aarch64/pixel-prim.cpp
-static inline void _sa8d_8x8_neon_end(int16x8_t &v0, int16x8_t &v1, int16x8_t v2, int16x8_t v3,
- int16x8_t v20, int16x8_t v21, int16x8_t v22, int16x8_t v23)
+static inline void _sa8d_8x8_neon_end(int16x8_t v0, int16x8_t v1, int16x8_t v2,
+ int16x8_t v3, int16x8_t v20,
+ int16x8_t v21, int16x8_t v22,
+ int16x8_t v23, sa8d_out_type &out0,
+ sa8d_out_type &out1)
source/common/aarch64/filter-prim.cpp
*) if you want to improve these interpolate functions, how about also improve algorithm?
For example, in the interp_vert_pp_neon, if we swap order of loop_row and loop_col, we can reuse most of input[*] at next row
*) Some of line need not change, for example,
- vsum = vmlal_lane_s16(vsum, vget_low_u16(input[0]), vget_low_s16(vc3), 0);
+ vsum = vmlal_lane_s16(vsum, vget_low_s16(input[0]),
+ vget_low_s16(vc3), 0);
Regards,
Chen
At 2024-08-13 23:18:41, "Hari Limaye" <hari.limaye at arm.com> wrote:
>This patch series performs some refactoring to AArch64 intrinsics code
>to use correct vector types and conversions for Neon vector operations,
>in order to enable building with -flax-vector-conversions=none.
>
>These patches are intended to be primarily refactoring only and are not
>intended to have any performance impact.
>
>The changes are based on the SAD series, as [PATCH 18/18] here makes
>changes to source/CMakeLists.txt which depends on CMake refactoring in:
> https://mailman.videolan.org/pipermail/x265-devel/2024-July/013740.html
>
>Many thanks,
>Hari
>
>Hari Limaye (18):
> AArch64: Use proper load/store intrinsics in pixel primitives
> AArch64: Refactor output variables in Neon sa8d helper
> AArch64: Use transpose helpers in pixel-prim.cpp
> AArch64: Refactor types and conversions in pixel-prim.cpp
> AArch64: Add missing include in arm64-utils.h
> AArch64: Use proper load/store intrinsics in arm64-utils.cpp
> AArch64: Refactor types and conversions in arm64-utils.cpp
> AArch64: Optimise shifts in filter-prim.cpp
> AArch64: Use proper load/store intrinsics in filter-prim.cpp
> AArch64: Refactor types and conversions in filter-prim.cpp
> AArch64: Use proper load/store intrinsics in intrapred-prim.cpp
> AArch64: Refactor types and conversions in intrapred-prim.cpp
> AArch64: Refactor narrowing in loopfilter-prim.cpp
> AArch64: Use proper load/store intrinsics in loopfilter-prim.cpp
> AArch64: Refactor types and conversions in loopfilter-prim.cpp
> AArch64: Use proper load/store intrinsics in dct-prim.cpp
> AArch64: Refactor types and conversions in dct-prim.cpp
> AArch64: Build with -flax-vector-conversions=none
>
> source/CMakeLists.txt | 8 +-
> source/common/CMakeLists.txt | 2 +-
> source/common/aarch64/arm64-utils.cpp | 478 ++++++-----
> source/common/aarch64/arm64-utils.h | 1 +
> source/common/aarch64/dct-prim.cpp | 132 +--
> source/common/aarch64/filter-prim.cpp | 168 ++--
> source/common/aarch64/intrapred-prim.cpp | 44 +-
> source/common/aarch64/loopfilter-prim.cpp | 113 +--
> source/common/aarch64/mem-neon.h | 59 ++
> source/common/aarch64/pixel-prim.cpp | 992 +++++++++++-----------
> 10 files changed, 1098 insertions(+), 899 deletions(-)
> create mode 100644 source/common/aarch64/mem-neon.h
>
>--
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240815/c48c905c/attachment.htm>
More information about the x265-devel
mailing list