[x265] [PATCH] asm: interp_8tap_vert_pX sse2
chen
chenm003 at 163.com
Fri May 29 21:14:31 CEST 2015
you may try to portab your new algorithm to SSE4, if performance better, you may send the new patch (I guess SSE4 have more performance than SSE2 in new algorithm)
At 2015-05-30 02:53:54,dave <dtyx265 at gmail.com> wrote:
FYI, If what I submitted performs better than the sse4 code then I suggest either improving the sse4 code with ssse3 and sse4 instructions or removing it.
On 05/29/2015 10:12 AM, chen wrote:
right,thanks
.
At 2015-05-30 01:01:15,dtyx265 at gmail.com wrote:
># HG changeset patch
># User David T Yuen <dtyx265 at gmail.com>
># Date 1432917446 25200
># Node ID 2d5efe979f6b9c8db275ecb53767e4bcff1da659
># Parent 12f0ed28ba0eb29f2df0bb8adbc5f3cfb40a6361
>asm: interp_8tap_vert_pX sse2
>
>This code replaces c code for sse2. It is the combination of the sse4 macros into
>one for smaller code size with no sacrifice in function and a few tweeks for performance.
>The original sse4 macros only use up to sse2 code so this code may perform better with the
>tweeks which include unrolling the inner loop which eliminated the need to use the stack
>to hold the counter for one of the loops and replaced incrementing the source register
>with address offsets.
>
_______________________________________________
x265-devel mailing list
x265-devel at videolan.orghttps://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20150530/05e16c1c/attachment-0001.html>
More information about the x265-devel
mailing list