[x265] [PATCH Review Only] ASM routine for interp_8tap_vert_pp_8xN function, (N=4, 8, 16, 32)

Jason Garrett-Glaser jason at x264.com
Wed Oct 30 20:58:37 CET 2013


> +    pmulhrsw    m7,        [tab_c_512]
> +    pmulhrsw    m6,        [tab_c_512]
> +    pmulhrsw    m5,        [tab_c_512]
> +    pmulhrsw    m4,        [tab_c_512]

Could we load this into a temp instead of loading it 4 times?

> +cglobal interp_8tap_vert_pp_%1x%2, 4, 7, 7
> +    mov         r4d,       r4m

Is this the same as just doing cglobal interp_8tap_vert_pp_%1x%2, 5, 7, 7 ?

> +    lea         r5,        [r1 + 2 * r1]
> +    sub         r0,        r5
> +
> +    shl         r4,        6

I think this should be r4d (general coding suggestion: use 32-bit
unless 64-bit/native-size is necessary, e.g. pointers).

> +xor         r4,        r4

Same here (xor r4d, r4d should be equivalent).

> +add         r4d,       %2
> +
> +.loopH
> +    FILTER_VL_W8_4R
> +
> +    lea         r5,        [4 * r1]
> +    sub         r0,        r5
> +    lea         r5,        [4 * r3]
> +    add         r2,        r5

lea r2, [r2+4*r3]

Jason


More information about the x265-devel mailing list