<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div>My goal is cover most of function, after that, we can evalerate size</div><div>and speed to find a balance point in every asm function.</div><div> </div><div>Thanks,</div><div>Min</div><pre><br>At 2015-06-04 05:23:22,dave <dtyx265@gmail.com> wrote:
>Steve and Min or anyone else with an opinion,
>
>For many of the primitives that I have submitted I unrolled loops with
>%rep to improve performance but for nested loops I only unrolled the
>inner loop leaving the outer loop intact. For my last submission,
>interp_4tap_horiz_pp/s, it is entirely unrolled with %rep, as is the
>sse4 version, though in a different way. This probably generates
>considerably larger executables, especially for the larger sizes. Is
>there any preference on this? Are x265's goals purely performance
>related over memory usage?
>
>thanks,
>Dave
>_______________________________________________
>x265-devel mailing list
>x265-devel@videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
</pre></div>