[x265] [PATCH] RISC-V: Add RVV optimized DCT32x32
chen
chenm003 at 163.com
Sun Apr 12 04:59:28 UTC 2026
Thank for contribution.
Most looks good to me, some comments,
+.macro lx rd, addr
+#if (__riscv_xlen == 32)
+ lw \rd, \addr
+#elif (__riscv_xlen == 64)
+ ld \rd, \addr
+#else
+ lq \rd, \addr
+#endif
RV128I still draft, we may replace by #error here
+ li t0, 4096
+ // temp stack address
+ sub t5, sp, t0
+ li t0, 2048
+ sub sp, t5, t0
I don't suggest allocate 6KB stack in the function without check page available, it more than 4KB page size, potential memory risk.
Another risk large VLEN may overflow temporary buffer, please add comment to indicate safety VLEN range, (VLEN<=1024 ?)
+ li t1, 32
+ vsetvli t4, t1, e16, m1, ta, ma <-- m1
...
+function func_tr_32xN_\name\()_rvv
+ .option arch, +zba
+ // E saved from tmp stack
+ mv a7, t5
+ // one vector bytes after widen
+ slli t2, t4, 2
Here potential depends on m1, suggest add comment to remind that if vsetvli changed, need update here either
Others,
Some ident mismatch on line DCT32_4_DST_ADD_1_MEMBER
At 2026-02-06 16:14:53, "daichengrong" <daichengrong at iscas.ac.cn> wrote:
>This patch adds an RVV-optimized implementation of DCT 32x32 for RISC-V.
>
>The current implementation in the repository is written with the assumption of a 128-bit VLEN and does not account for wider vector lengths. Therefore, initial testing was performed on a 128-bit platform, allowing the results to directly reflect the advantages of the optimized code over the existing implementation.
>
>**SG2044 (128-bit VLEN):**
>
>```
>dct32x32 | 5.14x | 1800.12 | 9247.73
>dct32x32 | 9.85x | 935.26 | 9214.26
>```
>
>Building on this, the new implementation adopts a Vector-Length Agnostic (VLA) design. Additional testing on a 256-bit platform demonstrates good scalability and further performance gains.
>
>**Banana Pi F3 (256-bit VLEN):**
>
>```
>dct32x32 | 5.59x | 2222.48 | 12420.64
>dct32x32 | 13.28x | 935.97 | 12431.17
>```
>
>To simplify comparison with the existing implementation, this patch introduces an `RVV_DCT32_OPT` compile-time option. The optimization can be disabled using:
>
>```
>-DRVV_DCT32_OPT=0
>```
>
>allowing straightforward A/B performance testing.
>
>Signed-off-by: daichengrong <daichengrong at iscas.ac.cn>
>---
> source/CMakeLists.txt | 6 +
> source/common/CMakeLists.txt | 2 +-
> source/common/riscv64/asm-primitives.cpp | 3 +
> source/common/riscv64/dct-32dct.S | 714 +++++++++++++++++++++++
> source/common/riscv64/fun-decls.h | 1 +
> 5 files changed, 725 insertions(+), 1 deletion(-)
> mode change 100755 => 100644 source/CMakeLists.txt
> create mode 100644 source/common/riscv64/dct-32dct.S
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20260412/928d22c4/attachment.htm>
More information about the x265-devel
mailing list