Small performance improvement in register addressing to reduce the number of lea instructions. I tried these type of tweaks on the other interp_4tap_vert_pX primitives only to find mixed results and might submit more tweaks after more investigation.