<div dir="ltr">All the patches of this series have been pushed to the master branch.<div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><b>__________________________</b></div><div><b>Karam Singh</b></div><div><b>Ph.D. IIT Guwahati</b></div><div><font size="1">Senior Software (Video Coding) Engineer </font></div><div><font size="1">Mobile: +91 8011279030</font></div><div><font size="1">Block 9A, 6th floor, DLF Cyber City</font></div><div><font size="1">Manapakkam, Chennai 600 089</font></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 28, 2024 at 10:48 AM chen <<a href="mailto:chenm003@163.com">chenm003@163.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="line-height:1.7;color:rgb(0,0,0);font-size:14px;font-family:Arial"><div id="m_-2030105860082732471spnEditorContent"><p style="margin:0px">Hi Hari,</p><p style="margin:0px"><br></p><p style="margin:0px">Thank for the update, it looks good to me now.</p><p style="margin:0px"><br></p><p style="margin:0px">For the DCT optimize, we may keep your version, if we write hand-write version future, we may try again.</p><p style="margin:0px"><br></p></div><div style="zoom:1"></div><div id="m_-2030105860082732471divNeteaseMailCard"></div><div style="margin:0px">Regards,</div><div style="margin:0px">Chen</div><pre><br>At 2024-08-27 23:10:18, "Hari Limaye" <<a href="mailto:hari.limaye@arm.com" target="_blank">hari.limaye@arm.com</a>> wrote:
>Hi Chen,
>
>Thank you for reviewing the patches. Please see replies below - we have pushed this updated series to address your comments.
>
>>1. These function will share for 8/10/12 bpp, if move into X265_NS, it will make duplicated copy
>
>If I understand correctly, you are referring to the duplication that will occur when building the "Multi-library interface"? I think this duplication exists already in the current multi bitdepth build, its just that the symbols are defined with internal linkage. By moving things into the X265_NS we don't add any duplicated copies, although we do at least make the functions reference-able from other code. Also if I understand correctly, this should not be an issue for the C implementations - as when building the Multi-library interface you should set EXPORT_C_API=OFF for all but one of the builds.
>
>>2. add new section "namespace X265_NS" before these functions are better than move, it affects code history record.
>
>Updated patch to do this.
>
>>partialButterfly8_neon
>For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd spends 8 operators per line, Even spends 4 operators plus 1 temporary store and 2 prepare operators, totally 7 operators with dependency link, looks no performance benefits, especally SVE SVDOT may get more performance with Odd method.
>
>Regarding EE/EO not being a good idea in 8x8: register pressure is also a consideration here, as well as how the end of the first pass interacts with the start of the second (in terms of data positioning once they have been inlined - since everything stays in registers rather than being stored to/loaded from memory.) The current implementation performs better than the suggested implementations as computing EE and EO reduces register pressure (an issue 16-bit dot product cannot mitigate) and also allows the compiler to optimize more effectively across both passes after they have been inlined.
>
>>Code style mismatch in different code section, one line is better.
>>+ int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]),
>>+ vmull_s16(c1, O[j + 1]));
>>*** vs
>>+ t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1]));
>
>The reason for the mismatch here is because the second line goes over the character limit of 80. This formatting is produced automatic formatting software (uncrustify).
>
>>dct8_neon
>>Better inline two of partialButterfly8_neon, it reduce some operators, such as int32x4_t c0 = vld1q_s32(t8_even[0]);
>>16x16 and 32x2 are similar
>
>The compiler currently inlines these - we have updated patches to be more explicit by using the `inline` keyword for the butterfly helper functions.
>
>>const table may share in between Neon and Sve code
>
>Updated with this done.
>
>Many thanks,
>Hari
>
>Hari Limaye (5):
> Move C DCT implementations into X265_NS
> AArch64: Move Neon DCT implementations into X265_NS
> AArch64: Optimise partialButterfly8_neon
> AArch64: Optimise partialButterfly16_neon
> AArch64: Optimise partialButterfly32_neon
>
>Jonathan Wright (4):
> AArch64: Move Neon-SVE bridge helpers into dedicated header
> AArch64: Add SVE implementation of 8x8 DCT
> AArch64: Add SVE implementation of 16x16 DCT
> AArch64: Add SVE implementation of 32x32 DCT
>
> source/common/CMakeLists.txt | 2 +-
> source/common/aarch64/asm-primitives.cpp | 1 +
> source/common/aarch64/dct-prim-sve.cpp | 491 ++++++++++++++++++++
> source/common/aarch64/dct-prim.cpp | 557 +++++++++++++++--------
> source/common/aarch64/dct-prim.h | 40 ++
> source/common/aarch64/neon-sve-bridge.h | 67 +++
> source/common/aarch64/sao-prim.h | 32 +-
> source/common/dct.cpp | 22 +-
> 8 files changed, 980 insertions(+), 232 deletions(-)
> create mode 100644 source/common/aarch64/dct-prim-sve.cpp
> create mode 100644 source/common/aarch64/neon-sve-bridge.h
>
>--
>2.42.1
</pre></div>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div>