[x265] [PATCH 1/9] Move C DCT implementations into X265_NS
chen
chenm003 at 163.com
Fri Aug 23 04:41:47 UTC 2024
Hi Hari & Jonathan,
Thank for the patches, I have some comments,
[PATCH 1/9] Move C DCT implementations into X265_NS
1. These function will share for 8/10/12 bpp, if move into X265_NS, it will make duplicated copy
2. add new section "namespace X265_NS" before these functions are better than move, it affects code history record.
[PATCH 3/9] AArch64: Optimise partialButterfly8_neon
[PATCH 4/9] AArch64: Optimise partialButterfly16_neon
[PATCH 5/9] AArch64: Optimise partialButterfly32_neon
[PATCH 7/9] AArch64: Add SVE implementation of 8x8 DCT
[PATCH 8/9] AArch64: Add SVE implementation of 16x16 DCT
[PATCH 9/9] AArch64: Add SVE implementation of 32x32 DCT
partialButterfly8_neon
For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd spends 8 operators per line, Even spends 4 operators plus 1 temporary store and 2 prepare operators, totally 7 operators with dependency link, looks no performance benefits, especally SVE SVDOT may get more performance with Odd method.
Code style mismatch in different code section, one line is better.
+ int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]),
+ vmull_s16(c1, O[j + 1]));
*** vs
+ t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1]));
dct8_neon
Better inline two of partialButterfly8_neon, it reduce some operators, such as int32x4_t c0 = vld1q_s32(t8_even[0]);
16x16 and 32x2 are similar
const table may share in between Neon and Sve code
Regards,
Chen
At 2024-08-22 23:17:50, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Move C implementations of DCT functions into the X265_NS namespace, and
>remove the static modifier from their declarations, so that they can be
>referenced from external code when linking to libx265.
>---
> source/common/dct.cpp | 340 +++++++++++++++++++++---------------------
> 1 file changed, 170 insertions(+), 170 deletions(-)
>
>diff --git a/source/common/dct.cpp b/source/common/dct.cpp
>index b102b6e31..d318b2c64 100644
>--- a/source/common/dct.cpp
>+++ b/source/common/dct.cpp
>@@ -439,176 +439,6 @@ static void partialButterfly4(const int16_t* src, int16_t* dst, int shift, int l
> }
> }
>
>-static void dst4_c(const int16_t* src, int16_t* dst, intptr_t srcStride)
>-{
>- const int shift_1st = 1 + X265_DEPTH - 8;
>- const int shift_2nd = 8;
>-
>- ALIGN_VAR_32(int16_t, coef[4 * 4]);
>- ALIGN_VAR_32(int16_t, block[4 * 4]);
>-
>- for (int i = 0; i < 4; i++)
>- {
>- memcpy(&block[i * 4], &src[i * srcStride], 4 * sizeof(int16_t));
>- }
>-
>- fastForwardDst(block, coef, shift_1st);
>- fastForwardDst(coef, dst, shift_2nd);
>-}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240823/ed3aaf81/attachment.htm>
More information about the x265-devel
mailing list