On Tue, Nov 1, 2016 at 11:16 PM, Luca Barbato <lu_zero at gentoo.org> wrote: > Doubles the speedup from the function (from being slower to be over > twice as fast than C). Out of curiosity; did you try anything in between "not unrolled" and "fully unrolled", like unrolling by 4 per iteration for example?