[x265] [PATCH] threadpool.cpp: use WIN system call for popcount

Andrey Semashev andrey.semashev at gmail.com
Thu May 3 19:54:46 CEST 2018


On Thu, May 3, 2018 at 7:37 PM, Pradeep Ramachandran
<pradeep at multicorewareinc.com> wrote:
>
> On Thu, May 3, 2018 at 2:23 PM, <praveen at multicorewareinc.com> wrote:
>>
>> # HG changeset patch
>> # User Praveen Tiwari <praveen at multicorewareinc.com>
>> # Date 1525328839 -19800
>> #      Thu May 03 11:57:19 2018 +0530
>> # Branch stable
>> # Node ID 9cbb2aadcca3a2f7a308ea1dc792fb817bcc5b51
>> # Parent  69aafa6d70ad4e151f4590766c6b125621c5d007
>> threadpool.cpp: use WIN system call for popcount
>
>
> Unless this fixes a known bug, I don't want to push this directly into
> stable. Syscalls are notorious especially when working with older versions
> of the OS.
> I would rather push this into default and allow users to test that this
> works with all kinds of systems and then merge with stable once the answer
> is known.
> Does this fix a specific issue on some platform, or improve performance?

The comment is not quite right, __popcnt is not a syscall but an
MSVC-specific intrinsic.

https://msdn.microsoft.com/en-us/library/bb385231.aspx

The equivalent gcc intrinsic is __builtin_popcount and friends.

I think, the patch is buggy because the relevant field is a 64-bit
integer on 64-bit Windows and __popcnt is 32-bit.

Note also that the popcount instruction only available in ABM ISA
extension. In Intel CPUs it is available since Nehalem.

>> diff -r 69aafa6d70ad -r 9cbb2aadcca3 source/common/threadpool.cpp
>> --- a/source/common/threadpool.cpp      Wed May 02 15:15:05 2018 +0530
>> +++ b/source/common/threadpool.cpp      Thu May 03 11:57:19 2018 +0530
>> @@ -71,21 +71,6 @@
>>  # define strcasecmp _stricmp
>>  #endif
>>
>> -#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
>> -const uint64_t m1 = 0x5555555555555555; //binary: 0101...
>> -const uint64_t m2 = 0x3333333333333333; //binary: 00110011..
>> -const uint64_t m3 = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
>> -const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of
>> 0,1,2,3...
>> -
>> -static int popCount(uint64_t x)
>> -{
>> -    x -= (x >> 1) & m1;
>> -    x = (x & m2) + ((x >> 2) & m2);
>> -    x = (x + (x >> 4)) & m3;
>> -    return (x * h01) >> 56;
>> -}
>> -#endif
>> -
>>  namespace X265_NS {
>>  // x265 private namespace
>>
>> @@ -274,7 +259,7 @@
>>      for (int i = 0; i < numNumaNodes; i++)
>>      {
>>          GetNumaNodeProcessorMaskEx((UCHAR)i, groupAffinityPointer);
>> -        cpusPerNode[i] = popCount(groupAffinityPointer->Mask);
>> +        cpusPerNode[i] = __popcnt(static_cast<unsigned
>> int>(groupAffinityPointer->Mask));
>>      }
>>      delete groupAffinityPointer;
>>  #elif HAVE_LIBNUMA
>> @@ -623,7 +608,7 @@
>>      for (int i = 0; i < numNumaNodes; i++)
>>      {
>>          GetNumaNodeProcessorMaskEx((UCHAR)i, &groupAffinity);
>> -        cpus += popCount(groupAffinity.Mask);
>> +        cpus += __popcnt(static_cast<unsigned int>(groupAffinity.Mask));
>>      }
>>      return cpus;
>>  #elif _WIN32
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>
>
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>


More information about the x265-devel mailing list