<div data-ntes="ntes_mail_body_root" style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div id="spnEditorContent"><div style="margin: 0;">Thank for contribution.</div><div style="margin: 0;">Most looks good to me, some comments,</div><div style="margin: 0;"><br></div><div style="margin: 0;">+.macro lx rd, addr
</div><div style="margin: 0;">+#if (__riscv_xlen == 32)
</div><div style="margin: 0;">+ lw \rd, \addr
</div><div style="margin: 0;">+#elif (__riscv_xlen == 64)
</div><div style="margin: 0;">+ ld \rd, \addr
</div><div style="margin: 0;">+#else
</div><div style="margin: 0;">+ lq \rd, \addr
</div><div style="margin: 0;">+#endif</div><div style="margin: 0;">RV128I still draft, we may replace by #error here</div><div style="margin: 0;"><br></div><div style="margin: 0;">+ li t0, 4096
</div><div style="margin: 0;">+ // temp stack address
</div><div style="margin: 0;">+ sub t5, sp, t0
</div><div style="margin: 0;">+ li t0, 2048
</div><div style="margin: 0;">+ sub sp, t5, t0
</div><div style="margin: 0;">I don't suggest allocate 6KB stack in the function without check page available, it more than 4KB page size, potential memory risk.<br>Another risk large VLEN may overflow temporary buffer, please add comment to indicate safety VLEN range, (VLEN<=1024 ?)</div><div style="margin: 0;"><pre style="text-wrap-mode: wrap;">+ li t1, 32
<br><span style="font-family: Arial;">+ vsetvli t4, t1, e16, m1, ta, ma <-- m1
<br>...<br>+function func_tr_32xN_\name\()_rvv<br></span>+ .option arch, +zba<font face="Arial"><br></font>+ // E saved from tmp stack<br>+ mv a7, t5<br>+ // one vector bytes after widen<br>+ slli t2, t4, 2</pre></div><div style="margin: 0;">Here potential depends on m1, suggest add comment to remind that if vsetvli changed, need update here either</div><div style="margin: 0;"><br></div><div style="margin: 0;"><br></div><div style="margin: 0;">Others,</div><div style="margin: 0;">Some ident mismatch on line DCT32_4_DST_ADD_1_MEMBER</div><div style="margin: 0;"><br></div></div><pre>At 2026-02-06 16:14:53, "daichengrong" <daichengrong@iscas.ac.cn> wrote:
>This patch adds an RVV-optimized implementation of DCT 32x32 for RISC-V.
>
>The current implementation in the repository is written with the assumption of a 128-bit VLEN and does not account for wider vector lengths. Therefore, initial testing was performed on a 128-bit platform, allowing the results to directly reflect the advantages of the optimized code over the existing implementation.
>
>**SG2044 (128-bit VLEN):**
>
>```
>dct32x32 | 5.14x | 1800.12 | 9247.73
>dct32x32 | 9.85x | 935.26 | 9214.26
>```
>
>Building on this, the new implementation adopts a Vector-Length Agnostic (VLA) design. Additional testing on a 256-bit platform demonstrates good scalability and further performance gains.
>
>**Banana Pi F3 (256-bit VLEN):**
>
>```
>dct32x32 | 5.59x | 2222.48 | 12420.64
>dct32x32 | 13.28x | 935.97 | 12431.17
>```
>
>To simplify comparison with the existing implementation, this patch introduces an `RVV_DCT32_OPT` compile-time option. The optimization can be disabled using:
>
>```
>-DRVV_DCT32_OPT=0
>```
>
>allowing straightforward A/B performance testing.
>
>Signed-off-by: daichengrong <daichengrong@iscas.ac.cn>
>---
> source/CMakeLists.txt | 6 +
> source/common/CMakeLists.txt | 2 +-
> source/common/riscv64/asm-primitives.cpp | 3 +
> source/common/riscv64/dct-32dct.S | 714 +++++++++++++++++++++++
> source/common/riscv64/fun-decls.h | 1 +
> 5 files changed, 725 insertions(+), 1 deletion(-)
> mode change 100755 => 100644 source/CMakeLists.txt
> create mode 100644 source/common/riscv64/dct-32dct.S
>
</pre></div>