<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:10.0pt;
font-family:"Aptos",sans-serif;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Consolas",serif;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Chen,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks for the comment.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">LDP+STP is recommended in optimization guide for the memory copy loops.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Older compilers sometimes struggle to generate optimal code from the vld1q_<x>_x2 intrinsics.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Using 2 </span><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">vld1q_<x>
</span><span style="font-size:11.0pt">is most likely to get most compilers to generate something optimal (LDP + STP).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Li<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">chen <chenm003@163.com><br>
<b>Date: </b>Tuesday, 2025. May 20. at 5:18<br>
<b>To: </b>Development for x265 <x265-devel@videolan.org><br>
<b>Cc: </b>nd <nd@arm.com>, Li Zhang <Li.Zhang2@arm.com><br>
<b>Subject: </b>Re:[x265] [PATCH 0/8] AArch64: Clean up and optimize block copy primitives<o:p></o:p></span></p>
</div>
<div>
<div id="spnEditorContent">
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">Hi Li,<o:p></o:p></span></p>
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">Thank for the improve patches.<o:p></o:p></span></p>
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">It looks good to me, just a little comment below<o:p></o:p></span></p>
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">In the most function,<br>
+ int16x8_t a0 = vld1q_s16(src + w + 0); + int16x8_t a1 = vld1q_s16(src + w + 8);<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">How about performance compare to vld1q_s16_x2 ?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<pre><span style="color:black">Regards,<o:p></o:p></span></pre>
<pre><span style="color:black">Chen<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
</div>
<pre><span style="color:black">At 2025-05-20 00:41:39, "Li Zhang" <li.zhang2@arm.com> wrote:<o:p></o:p></span></pre>
<pre><span style="color:black">>Hello,<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">>This patch series optimizes and implements several AArch64 block copy<o:p></o:p></span></pre>
<pre><span style="color:black">>primitives using Neon intrinsics. It also cleans up and removes the Neon<o:p></o:p></span></pre>
<pre><span style="color:black">>and SVE assembly implementations that are either slower or offer no<o:p></o:p></span></pre>
<pre><span style="color:black">>performance benefit.<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">>Many thanks,<o:p></o:p></span></pre>
<pre><span style="color:black">>Li<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">>Li Zhang (8):<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Optimize blockcopy_pp_neon intrinsics implementation<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Optimize blockcopy_ps Neon intrinsics implementation<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Implement blockcopy_ss primitives using Neon intrinsics<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Implement blockcopy_sp primitives using Neon intrinsics<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Optimize cpy1Dto2D_shl Neon intrinsics implementation<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Optimize cpy2Dto1D_shl Neon intrinsics implementation<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Implement cpy2Dto1D_shr using Neon intrinsics<o:p></o:p></span></pre>
<pre><span style="color:black">> AArch64: Implement cpy1Dto2D_shr using Neon intrinsics<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">> source/common/CMakeLists.txt | 2 +-<o:p></o:p></span></pre>
<pre><span style="color:black">> source/common/aarch64/asm-primitives.cpp | 180 ---<o:p></o:p></span></pre>
<pre><span style="color:black">> source/common/aarch64/blockcopy8-common.S | 54 -<o:p></o:p></span></pre>
<pre><span style="color:black">> source/common/aarch64/blockcopy8-sve.S | 1346 ---------------------<o:p></o:p></span></pre>
<pre><span style="color:black">> source/common/aarch64/blockcopy8.S | 1049 ----------------<o:p></o:p></span></pre>
<pre><span style="color:black">> source/common/aarch64/pixel-prim.cpp | 358 +++++-<o:p></o:p></span></pre>
<pre><span style="color:black">> 6 files changed, 305 insertions(+), 2684 deletions(-)<o:p></o:p></span></pre>
<pre><span style="color:black">> delete mode 100644 source/common/aarch64/blockcopy8-common.S<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">>--<o:p></o:p></span></pre>
<pre><span style="color:black">>2.39.5 (Apple Git-154)<o:p></o:p></span></pre>
<pre><span style="color:black">><o:p> </o:p></span></pre>
<pre><span style="color:black">>_______________________________________________<o:p></o:p></span></pre>
<pre><span style="color:black">>x265-devel mailing list<o:p></o:p></span></pre>
<pre><span style="color:black">>x265-devel@videolan.org<o:p></o:p></span></pre>
<pre><span style="color:black">>https://mailman.videolan.org/listinfo/x265-devel<o:p></o:p></span></pre>
</div>
</div>
</div>
</div>
</div>
</body>
</html>