[x265] Fwd: [PATCH Only Review, don't merge] Assembly routine for filterHorizontal_p_p() for 4 tap filter

Jason Garrett-Glaser jason at x264.com
Fri Sep 20 20:35:59 CEST 2013


> [MC] Excuse me, I think it is
> db -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0

Right, I messed up my endianness.

> To implement this change , we need to modify HM code.
> [MC] we can define the table in asm file, but we have to modify HM. of
> course, it is easy things

You don't have to, of course (you know the code better than I and
whether or not it's a good idea to change it).

>> +
>> +    mov         tmp,        offset2
>> +    movd        sumOffset,  tmp
>> +    pshufd      sumOffset,  sumOffset,  0
>
> You can movd directly from memory; going through a register is much
> slower, especially on AMD machines.
> [MC] are you means, we put constant into memory and load it once?

movd sumOffset, offset2

> [MC] no way, x264 macro have a bug here, you can remove reduce x2 and check
> the output, the xmm0 seems Intel limit

That makes sense, I don't think the x264 macro was ever designed to
support non-AVX pblendvb.  I don't recommend non-AVX pblendvb anyways
as it's a lot slower because of the extra register dependency (it's
like 3 uops or something).

Jason


More information about the x265-devel mailing list