At 2014-08-31 15:46:53,"Fiona Glaser" <fiona at x264.com> wrote: >+ movq xm0, [r0] >+ movq xm1, [r0 + r2] >+ punpcklqdq m0, m1 > >These can be replaced by movq + movhps; it should be one less uop. > >Fiona movhps generate 2 uops and use port5 on Haswell, just reduce code size