[x265] [PATCH Review Only] ASM routine for interp_8tap_vert_pp_8xN function, (N=4, 8, 16, 32)
Jason Garrett-Glaser
jason at x264.com
Wed Oct 30 20:58:37 CET 2013
> + pmulhrsw m7, [tab_c_512]
> + pmulhrsw m6, [tab_c_512]
> + pmulhrsw m5, [tab_c_512]
> + pmulhrsw m4, [tab_c_512]
Could we load this into a temp instead of loading it 4 times?
> +cglobal interp_8tap_vert_pp_%1x%2, 4, 7, 7
> + mov r4d, r4m
Is this the same as just doing cglobal interp_8tap_vert_pp_%1x%2, 5, 7, 7 ?
> + lea r5, [r1 + 2 * r1]
> + sub r0, r5
> +
> + shl r4, 6
I think this should be r4d (general coding suggestion: use 32-bit
unless 64-bit/native-size is necessary, e.g. pointers).
> +xor r4, r4
Same here (xor r4d, r4d should be equivalent).
> +add r4d, %2
> +
> +.loopH
> + FILTER_VL_W8_4R
> +
> + lea r5, [4 * r1]
> + sub r0, r5
> + lea r5, [4 * r3]
> + add r2, r5
lea r2, [r2+4*r3]
Jason
More information about the x265-devel
mailing list