[x265] Fwd: [PATCH Only Review, don't merge] Assembly routine for filterHorizontal_p_p() for 4 tap filter
Steve Borho
steve at borho.org
Sun Sep 22 03:29:42 CEST 2013
On Fri, Sep 20, 2013 at 8:55 PM, chen <chenm003 at 163.com> wrote:
> At 2013-09-21 02:35:59,"Jason Garrett-Glaser" <jason at x264.com> wrote:
>
> >
> >> To implement this change , we need to modify HM code.
> >> [MC] we can define the table in asm file, but we have to modify HM. of
> >> course, it is easy things
> >
> >You don't have to, of course (you know the code better than I and
> >whether or not it's a good idea to change it).
> If we don't modify code, we can't know which coef group they want,
> HEVC have 4 group to qpel, it is different to h264
>
> >>> +
> >>> + mov tmp, offset2
> >>> + movd sumOffset, tmp
> >>> + pshufd sumOffset, sumOffset, 0
> >>
> >> You can movd directly from memory; going through a register is much
> >> slower, especially on AMD machines.
> >> [MC] are you means, we put constant into memory and load it once?
> >
> >movd sumOffset, offset2
> I look the document before, I think there haven't instruction support
> ' movd reg, constant ' on Intel CPU
>
>
> >> [MC] no way, x264 macro have a bug here, you can remove reduce x2 and check
> >> the output, the xmm0 seems Intel limit
> >
> >That makes sense, I don't think the x264 macro was ever designed to
> >support non-AVX pblendvb. I don't recommend non-AVX pblendvb anyways
> >as it's a lot slower because of the extra register dependency (it's
> >like 3 uops or something).
> replace by 'pand + pandn + por' is 3 uops but less dependency,
> in Agner's documents, he said pblendvb is 2-uops, 2-latency and 1-through
> on my Sandy, so I select it.
>
> Of course, this is a bad branch, the code for testbench only.
> in really world, the minimum block is 4x8, the width is 4, movd is enough.
>
This sounds like a testbench bug then. Let's not keep dead code in the
primitive just because the testbench covers unrealistic block sizes.
Cheers
--
Steve Borho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20130921/a0017e33/attachment.html>
More information about the x265-devel
mailing list