[x265] [PATCH] Performance: Balance # threads per pool for non-NUMA machines with > 64 vCPUs

Pradeep Ramachandran pradeep at multicorewareinc.com
Tue Aug 25 05:37:49 CEST 2015


This may not be the best thing to do for performance though; it may spawn
few threads per pool and limit performance. Please don't push this in.

I think it may be better to spawn 64-threads in case the delta between #
cores and 64 is < x% of 64 - basically if the second pool won't have
"enough" threads to work on, don't spawn a second pool at all! I will send
a patch for this assuming x to be 50%.

Pradeep Ramachandran, PhD
Solution Architect,
Multicoreware Inc.
Ph:   +91 99627 82018

On Mon, Aug 24, 2015 at 6:55 PM, Steve Borho <steve at borho.org> wrote:

> On 08/24, pradeep at multicorewareinc.com wrote:
> > # HG changeset patch
> > # User pradeep
> > # Date 1440406873 -19800
> > #      Mon Aug 24 14:31:13 2015 +0530
> > # Node ID cf6210f6f5cbbeec441f7eee3d8abf82208942fd
> > # Parent  f63273fa3137fef2f6898c686b68ee12608acd31
> > Performance: Balance # threads per pool for non-NUMA machines with > 64
> vCPUs
> >
> > By default, each thread pool may have up to 64 threads. In the case of a
> CPU that
> > has > 64 threads without NUMA support, we end up creating the first pool
> with 64
> > threads and the other pool with a few stray threads significantly
> affecting performance.
> > This fix balances the threads out in that case.
> >
> > diff -r f63273fa3137 -r cf6210f6f5cb source/common/threadpool.cpp
> > --- a/source/common/threadpool.cpp    Thu Aug 20 11:13:25 2015 +0530
> > +++ b/source/common/threadpool.cpp    Mon Aug 24 14:31:13 2015 +0530
> > @@ -307,6 +307,16 @@
> >          numPools = X265_MAX(p->frameNumThreads / 2, 1);
> >      }
> >
> > +    // In the case that numa is disabled and we have more CPUs than 64,
> > +    // balance the # threads created across thread pools
> > +    if ((numNumaNodes==1) && (numPools > 1))
> > +    {
> > +        int threads = cpusPerNode[0];
> > +        for (int i = 0; i < numPools; i++)
> > +            cpusPerNode[i] = threads / numPools;
> > +        cpusPerNode[0] += (threads % numPools) ;
> > +    }
>
> other than w/s nits this seems fine.
>
> >      ThreadPool *pools = new ThreadPool[numPools];
> >      if (pools)
> >      {
> > _______________________________________________
> > x265-devel mailing list
> > x265-devel at videolan.org
> > https://mailman.videolan.org/listinfo/x265-devel
>
> --
> Steve Borho
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20150825/a105ffda/attachment.html>


More information about the x265-devel mailing list