[x265] [PATCH 1/9] Move C DCT implementations into X265_NS

chen chenm003 at 163.com
Fri Aug 23 04:41:47 UTC 2024


Hi Hari & Jonathan,




Thank for the patches, I have some comments,




[PATCH 1/9] Move C DCT implementations into X265_NS

1. These function will share for 8/10/12 bpp, if move into X265_NS, it will make duplicated copy

2. add new section "namespace X265_NS" before these functions are better than move, it affects code history record.




[PATCH 3/9] AArch64: Optimise partialButterfly8_neon

[PATCH 4/9] AArch64: Optimise partialButterfly16_neon

[PATCH 5/9] AArch64: Optimise partialButterfly32_neon

[PATCH 7/9] AArch64: Add SVE implementation of 8x8 DCT

[PATCH 8/9] AArch64: Add SVE implementation of 16x16 DCT

[PATCH 9/9] AArch64: Add SVE implementation of 32x32 DCT

partialButterfly8_neon
For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd spends 8 operators per line, Even spends 4 operators plus 1 temporary store and 2 prepare operators, totally 7 operators with dependency link, looks no performance benefits, especally SVE SVDOT may get more performance with Odd method.
Code style mismatch in different code section, one line is better.
+        int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]),
+                                   vmull_s16(c1, O[j + 1]));
*** vs
+        t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1]));


dct8_neon
Better inline two of partialButterfly8_neon, it reduce some operators, such as int32x4_t c0 = vld1q_s32(t8_even[0]);
16x16 and 32x2 are similar
const table may share in between Neon and Sve code




Regards,
Chen

At 2024-08-22 23:17:50, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Move C implementations of DCT functions into the X265_NS namespace, and
>remove the static modifier from their declarations, so that they can be
>referenced from external code when linking to libx265.
>---
> source/common/dct.cpp | 340 +++++++++++++++++++++---------------------
> 1 file changed, 170 insertions(+), 170 deletions(-)
>
>diff --git a/source/common/dct.cpp b/source/common/dct.cpp
>index b102b6e31..d318b2c64 100644
>--- a/source/common/dct.cpp
>+++ b/source/common/dct.cpp
>@@ -439,176 +439,6 @@ static void partialButterfly4(const int16_t* src, int16_t* dst, int shift, int l
>     }
> }
> 
>-static void dst4_c(const int16_t* src, int16_t* dst, intptr_t srcStride)
>-{
>-    const int shift_1st = 1 + X265_DEPTH - 8;
>-    const int shift_2nd = 8;
>-
>-    ALIGN_VAR_32(int16_t, coef[4 * 4]);
>-    ALIGN_VAR_32(int16_t, block[4 * 4]);
>-
>-    for (int i = 0; i < 4; i++)
>-    {
>-        memcpy(&block[i * 4], &src[i * srcStride], 4 * sizeof(int16_t));
>-    }
>-
>-    fastForwardDst(block, coef, shift_1st);
>-    fastForwardDst(coef, dst, shift_2nd);
>-}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240823/ed3aaf81/attachment.htm>


More information about the x265-devel mailing list