[x265] [PATCH] asm: interp_8tap_horiz pp and ps sse2

dave dtyx265 at gmail.com
Wed Apr 29 03:30:36 CEST 2015


On 04/28/2015 06:13 PM, chen wrote:
>
> 在 2015-04-29 07:49:46,dave <dtyx265 at gmail.com> 写道:
>
>     On 04/28/2015 03:32 PM, chen wrote:
>>     Most part are fine now, just modify about r5, see below comment
>>
>>     At 2015-04-29 06:27:27,dtyx265 at gmail.com  wrote:
>>     ># HG changeset patch
>>     ># User David T Yuendtyx265 at gmail.com>
>>     ># Date 1430259967 25200
>>     ># Node ID 6108fbda1be654a481a78f7ef593518033919674
>>     ># Parent  e9df93f380664932e7d6c7e85b2cae16cd5e1dcd
>>     >asm: interp_8tap_horiz pp and ps sse2
>>     >
>>     >This replaces c code and covers
>>     >
>>       <mailto:dtyx265 at gmail.com%3E%3E#%A0Date%A01430259967%A025200%3E#%A0Node%A0ID%A06108fbda1be654a481a78f7ef593518033919674%3E#%A0Parent%A0%A0e9df93f380664932e7d6c7e85b2cae16cd5e1dcd%3Easm:%A0interp_8tap_horiz%A0pp%A0and%A0ps%A0sse2%3E%3EThis%A0replaces%A0c%A0code%A0and%A0covers%3E>+;----------------------------------------------------------------------------------------------------------------------------
>>     >+; void interp_8tap_horiz_%3_%1x%2(pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx, int isRowExt)
>>     >+;----------------------------------------------------------------------------------------------------------------------------
>>     >+%macro IPFILTER_LUMA_sse2 3
>>     >+INIT_XMM sse2
>>     >+cglobal interp_8tap_horiz_%3_%1x%2, 4,7,8
>>     >+
>>     >+    mov       r4d, r4m
>>     >+    add       r4d, r4d
>>     >+    pxor      m6, m6
>>     >+%ifdef PIC
>>     >+    lea       r6, [tabw_LumaCoeff]
>>     >+    movu      m3, [r6 + r4 * 8]
>>     >+%else
>>     >+    movu      m3, [tabw_LumaCoeff + r4 * 8]
>>     >+%endif
>>     >+
>>     >+    mov       r4d, %2
>>     >+%ifidn %3, pp
>>     >+    mova      m2, [pw_32]
>>     >+%else
>>     >+    mova      m2, [pw_2000]
>>     >+    add       r3d, r3d
>>     >+    cmp       r5m, byte 0
>>     if we move above 2 lines to up, we can reduce r6 and reuse r5.
>     I am not sure if this can be done.  r4 is used to to set m3 then
>     it is reused and modified depending on r5 and r5 can't be used for
>     something else before it's cmp'ed.
>>     'mov, lea' didn't affect eflags register
>
> but r4 is needed for the lea instruction and r4 is later reused and 
> modified depending on r5.  Only one of them can be reused, not both, 
> so r6 is needed.
>
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20150428/4ab059ca/attachment.html>


More information about the x265-devel mailing list