[x265] [PATCH] integrate assembly code for psyCost_pp

chen chenm003 at 163.com
Thu Dec 11 19:50:22 CET 2014


 

At 2014-12-12 00:40:01,"Steve Borho" <steve at borho.org> wrote:
>On 12/11, Divya Manivannan wrote:
>> # HG changeset patch
>> # User Divya Manivannan <divya at multicorewareinc.com>
>> # Date 1418296477 -19800
>> #      Thu Dec 11 16:44:37 2014 +0530
>> # Node ID 440d264fcdf33889b665848f19e87ca3559d1b6c
>> # Parent  667e4ea0899fcf026ee9df935381487d3148ed0c
>> integrate assembly code for psyCost_pp
>> 
>> diff -r 667e4ea0899f -r 440d264fcdf3 source/common/pixel.cpp
>> --- a/source/common/pixel.cpp	Thu Dec 11 09:36:16 2014 +0530
>> +++ b/source/common/pixel.cpp	Thu Dec 11 16:44:37 2014 +0530
>> @@ -815,10 +815,11 @@
>>              for (int j = 0; j < dim; j+= 8)
>>              {
>>                  /* AC energy, measured by sa8d (AC + DC) minus SAD (DC) */
>> -                int sourceEnergy = sa8d_8x8(source + i * sstride + j, sstride, zeroBuf, 0) - 
>> -                                   (sad<8, 8>(source + i * sstride + j, sstride, zeroBuf, 0) >> 2);
>> -                int reconEnergy =  sa8d_8x8(recon + i * rstride + j, rstride, zeroBuf, 0) - 
>> -                                   (sad<8, 8>(recon + i * rstride + j, rstride, zeroBuf, 0) >> 2);
>> +                // PartitionFromSizes(8, 8) = 1
>> +                int sourceEnergy = primitives.sa8d[1](source + i * sstride + j, sstride, zeroBuf, 0) -
>> +                                   (primitives.sad[1](source + i * sstride + j, sstride, zeroBuf, 0) >> 2);
>> +                int reconEnergy = primitives.sa8d[1](recon + i * rstride + j, rstride, zeroBuf, 0) -
>> +                                  (primitives.sad[1](recon + i * rstride + j, rstride, zeroBuf, 0) >> 2);
>
>This is an improvement over just C code, but it is still vastly slower
>than writing new assembly functions for these. The function call
>overhead is non-trivial.
>
It reuse same input, so we can avoid many load/store when we write a new function.
 
>>  
>>                  totEnergy += abs(sourceEnergy - reconEnergy);
>>              }
>> @@ -828,8 +829,11 @@
>>      else
>>      {
>>          /* 4x4 is too small for sa8d */
>> -        int sourceEnergy = satd_4x4(source, sstride, zeroBuf, 0) - (sad<4, 4>(source, sstride, zeroBuf, 0) >> 2);
>> -        int reconEnergy = satd_4x4(recon, rstride, zeroBuf, 0) - (sad<4, 4>(recon, rstride, zeroBuf, 0) >> 2);
>> +        // partitionFromSizes(4, 4) = 0
>> +        int sourceEnergy = primitives.satd[0](source, sstride, zeroBuf, 0) -
>> +                           (primitives.sad[0](source, sstride, zeroBuf, 0) >> 2);
>> +        int reconEnergy = primitives.satd[0](recon, rstride, zeroBuf, 0) -
>> +                          (primitives.sad[0](recon, rstride, zeroBuf, 0) >> 2);
>>          return abs(sourceEnergy - reconEnergy);
>>      }
>>  }
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>
>-- 
>Steve Borho
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20141212/d1f04796/attachment.html>


More information about the x265-devel mailing list