[x265] [PATCH] Replace sad_12, sad_24, sad_32 vector class functions with intrinsics

Steve Borho steve at borho.org
Fri Oct 4 05:32:47 CEST 2013


On Thu, Oct 3, 2013 at 10:19 AM, Derek Buitenhuis <
derek.buitenhuis at gmail.com> wrote:

> On 10/1/2013 6:50 PM, Steve Borho wrote:
> > This is the effort to get rid of the GPL vector class headers   We're
> replacing primitives that use those classes with intrinsics because the GPL
> headers have to be gone by November and we prefer not crippling what
> performance we have today.
> >
> > At our current pace of assembly development, it would take about two
> years to replace them all with assembly and we don't have that luxury.
> >
> > Assembly development will continue in parallel with this effort; but
> there will be a lot of patches in the coming weeks that replace vector
> class based intrinsic primitives with pure-intrinsic primitives; or just
> deleting vector primitives that we don't care enough about to keep.
>
> I'll translate:
>

[14:37] < j-b> seriously, is that that hard to do .asm files instead of
> writing new intrinsincs?
>

These aren't new intrinsics, they're replacing existing intrinsics.  And
yes, anyone following the progress of the chroma 4-tap interpolation
assembly can clearly see this assembly effort has been slow to get off the
ground.


> [15:47] < Daemon404> j-b, intrinsics are faster because they dont get
> reviewed.
>

the first intrinsic patch was reviewed and rejected; though to be honest it
was reviewed by one of our guys.


> [15:47] < Daemon404> as opposed to Skyler_'s asm review
> [15:48] < Daemon404> i also love the arbitrary november date
>

x265 will have a limited commercial deployment in November; commercial
deployment means no non-dual-licensed code.  It is what is is; unless
someone is going to volunteer an army of assembly coders this is what we
have to do.


> [15:49] < Daemon404> oh holy sad intrinsics spam...
> [15:50] < Daemon404> there's no way that isn't just copy/pasted bullshit.
> [16:08] < Daemon404> also
> [16:08] < Daemon404> arent these (copy/pasted?) intrinsics possibly going
> to be super crappy
> [16:08] < Daemon404> due to register allocation


SAD routines are pretty boring, and there is some ugly hand-unrolling going
on.. but our SAD intrinsics are in the neighborhood of x264 SAD assembly
perf.

These new routines are generally twice as fast as the vector class based
routines they're replacing.

The 4-tap chroma interpolation assembly function we've just finished is
only a hair faster than the intrinsic-based primitive  (16x vs 15x faster
than C).

-- 
Steve Borho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131003/8e360fd5/attachment.html>


More information about the x265-devel mailing list