[x265] [PATCH] integrate assembly code for psyCost_pp

Steve Borho steve at borho.org
Thu Dec 11 22:38:26 CET 2014


On 12/12, chen wrote:
>  
> 
> At 2014-12-12 00:40:01,"Steve Borho" <steve at borho.org> wrote:
> >On 12/11, Divya Manivannan wrote:
> >> # HG changeset patch
> >> # User Divya Manivannan <divya at multicorewareinc.com>
> >> # Date 1418296477 -19800
> >> #      Thu Dec 11 16:44:37 2014 +0530
> >> # Node ID 440d264fcdf33889b665848f19e87ca3559d1b6c
> >> # Parent  667e4ea0899fcf026ee9df935381487d3148ed0c
> >> integrate assembly code for psyCost_pp
> >> 
> >> diff -r 667e4ea0899f -r 440d264fcdf3 source/common/pixel.cpp
> >> --- a/source/common/pixel.cpp	Thu Dec 11 09:36:16 2014 +0530
> >> +++ b/source/common/pixel.cpp	Thu Dec 11 16:44:37 2014 +0530
> >> @@ -815,10 +815,11 @@
> >>              for (int j = 0; j < dim; j+= 8)
> >>              {
> >>                  /* AC energy, measured by sa8d (AC + DC) minus SAD (DC) */
> >> -                int sourceEnergy = sa8d_8x8(source + i * sstride + j, sstride, zeroBuf, 0) - 
> >> -                                   (sad<8, 8>(source + i * sstride + j, sstride, zeroBuf, 0) >> 2);
> >> -                int reconEnergy =  sa8d_8x8(recon + i * rstride + j, rstride, zeroBuf, 0) - 
> >> -                                   (sad<8, 8>(recon + i * rstride + j, rstride, zeroBuf, 0) >> 2);
> >> +                // PartitionFromSizes(8, 8) = 1
> >> +                int sourceEnergy = primitives.sa8d[1](source + i * sstride + j, sstride, zeroBuf, 0) -
> >> +                                   (primitives.sad[1](source + i * sstride + j, sstride, zeroBuf, 0) >> 2);
> >> +                int reconEnergy = primitives.sa8d[1](recon + i * rstride + j, rstride, zeroBuf, 0) -
> >> +                                  (primitives.sad[1](recon + i * rstride + j, rstride, zeroBuf, 0) >> 2);
> >
> >This is an improvement over just C code, but it is still vastly slower
> >than writing new assembly functions for these. The function call
> >overhead is non-trivial.
> >
> It reuse same input, so we can avoid many load/store when we write a new function.

Yes, and there's no reason to ever actually load zeros

-- 
Steve Borho


More information about the x265-devel mailing list