<html><head></head><body>I considered the idea of not operating in-place, but that would mean doing one extra allocation per block, which may be outright slower.<br>

<br>

The implementation of swab is similar to other existing ones: glibc's (<a href="https://github.com/lattera/glibc/blob/master/string/swab.c)">https://github.com/lattera/glibc/blob/master/string/swab.c)</a> does it but in reverse with two temporary variables, FreeBSD's (<a href="https://github.com/ddeville/libc/blob/master/src/string/FreeBSD/swab.c)">https://github.com/ddeville/libc/blob/master/src/string/FreeBSD/swab.c)</a> does it 8 steps at a time, but they all follow the same algorithm to a certain extent.<br>

<br>

If anything, the compiler may replace swab with a vectorized version (it would be a good use case for a shuffle instruction from SSSE3), but I couldn't find anything from a quick Google search.<br><br><div class="gmail_quote">On February 19, 2018 9:43:48 AM GMT+01:00, "Rémi Denis-Courmont" <remi@remlab.net> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<pre class="k9mail">Hello,<br><br>I don't see the point in operating in place in this case? Meanwhile reimplemented swab() looks much slower than swab() itself can be.</pre></blockquote></div></body></html>