[x265] [PATCH] threadpool.cpp: use WIN system call for popcount

Praveen Tiwari praveen at multicorewareinc.com
Fri May 4 07:15:21 CEST 2018


It is just counting cpusPerNode, so the 64-bit number is not required, yes
but I missed the fact of support on few CPUs.  Lookup table
based implementation could have been fastest due to better caching, but it
is not used frequently so we can keep as it is. Thanks.

On Thu, May 3, 2018 at 11:24 PM, Andrey Semashev <andrey.semashev at gmail.com>
wrote:

> On Thu, May 3, 2018 at 7:37 PM, Pradeep Ramachandran
> <pradeep at multicorewareinc.com> wrote:
> >
> > On Thu, May 3, 2018 at 2:23 PM, <praveen at multicorewareinc.com> wrote:
> >>
> >> # HG changeset patch
> >> # User Praveen Tiwari <praveen at multicorewareinc.com>
> >> # Date 1525328839 -19800
> >> #      Thu May 03 11:57:19 2018 +0530
> >> # Branch stable
> >> # Node ID 9cbb2aadcca3a2f7a308ea1dc792fb817bcc5b51
> >> # Parent  69aafa6d70ad4e151f4590766c6b125621c5d007
> >> threadpool.cpp: use WIN system call for popcount
> >
> >
> > Unless this fixes a known bug, I don't want to push this directly into
> > stable. Syscalls are notorious especially when working with older
> versions
> > of the OS.
> > I would rather push this into default and allow users to test that this
> > works with all kinds of systems and then merge with stable once the
> answer
> > is known.
> > Does this fix a specific issue on some platform, or improve performance?
>
> The comment is not quite right, __popcnt is not a syscall but an
> MSVC-specific intrinsic.
>
> https://msdn.microsoft.com/en-us/library/bb385231.aspx
>
> The equivalent gcc intrinsic is __builtin_popcount and friends.
>
> I think, the patch is buggy because the relevant field is a 64-bit
> integer on 64-bit Windows and __popcnt is 32-bit.
>
> Note also that the popcount instruction only available in ABM ISA
> extension. In Intel CPUs it is available since Nehalem.
>
> >> diff -r 69aafa6d70ad -r 9cbb2aadcca3 source/common/threadpool.cpp
> >> --- a/source/common/threadpool.cpp      Wed May 02 15:15:05 2018 +0530
> >> +++ b/source/common/threadpool.cpp      Thu May 03 11:57:19 2018 +0530
> >> @@ -71,21 +71,6 @@
> >>  # define strcasecmp _stricmp
> >>  #endif
> >>
> >> -#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
> >> -const uint64_t m1 = 0x5555555555555555; //binary: 0101...
> >> -const uint64_t m2 = 0x3333333333333333; //binary: 00110011..
> >> -const uint64_t m3 = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
> >> -const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power
> of
> >> 0,1,2,3...
> >> -
> >> -static int popCount(uint64_t x)
> >> -{
> >> -    x -= (x >> 1) & m1;
> >> -    x = (x & m2) + ((x >> 2) & m2);
> >> -    x = (x + (x >> 4)) & m3;
> >> -    return (x * h01) >> 56;
> >> -}
> >> -#endif
> >> -
> >>  namespace X265_NS {
> >>  // x265 private namespace
> >>
> >> @@ -274,7 +259,7 @@
> >>      for (int i = 0; i < numNumaNodes; i++)
> >>      {
> >>          GetNumaNodeProcessorMaskEx((UCHAR)i, groupAffinityPointer);
> >> -        cpusPerNode[i] = popCount(groupAffinityPointer->Mask);
> >> +        cpusPerNode[i] = __popcnt(static_cast<unsigned
> >> int>(groupAffinityPointer->Mask));
> >>      }
> >>      delete groupAffinityPointer;
> >>  #elif HAVE_LIBNUMA
> >> @@ -623,7 +608,7 @@
> >>      for (int i = 0; i < numNumaNodes; i++)
> >>      {
> >>          GetNumaNodeProcessorMaskEx((UCHAR)i, &groupAffinity);
> >> -        cpus += popCount(groupAffinity.Mask);
> >> +        cpus += __popcnt(static_cast<unsigned
> int>(groupAffinity.Mask));
> >>      }
> >>      return cpus;
> >>  #elif _WIN32
> >> _______________________________________________
> >> x265-devel mailing list
> >> x265-devel at videolan.org
> >> https://mailman.videolan.org/listinfo/x265-devel
> >
> >
> >
> > _______________________________________________
> > x265-devel mailing list
> > x265-devel at videolan.org
> > https://mailman.videolan.org/listinfo/x265-devel
> >
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20180504/840ec32c/attachment-0001.html>


More information about the x265-devel mailing list