[x265] Question about NUMA and core/thread use

Michael Lackner michael.lackner at unileoben.ac.at
Wed May 10 09:02:20 CEST 2017


On 05/10/2017 08:24 AM, Pradeep Ramachandran wrote:
> On Wed, May 10, 2017 at 11:12 AM, Michael Lackner <
> michael.lackner at unileoben.ac.at> wrote:
> 
>> Thank you very much for your input!
>>
>> Since --pmode and --pme seem to break NUMA support, I disabled them. I
>> simply cannot tell
>> users that they have to switch off NUMA in their UEFI firmware just for
>> this one
>> application. There may be a lot of situations where this is just not
>> doable.
>>
>> If there is a way to make --pmode --pme work together with x265s' NUMA
>> support, I'd use
>> it, but I don't know how?
> 
> Could you please elaborate more here? It seems to work ok for us here. I've
> tried on CentOS and Win Server 2017 dual socket systems and I see all
> sockets being used.

It's like this: x265 does *say* it's using 32 threads in two NUMA pools. That's just how
it should be. But it behaves very weirdly, almost never loading more than two logical
cores. FPS are extremely low, so it's really slow.

CPU load stays at 190-200%, sometimes briefly dropping to 140-150%, where it should be in
the range of 2800-3200%. As soon as I remove --pmode --pme, the system is being loaded
very well! It almost never drops below the 3000% (30 cores) mark then.

I also works *with* --pmode --pme, but only if NUMA is disabled on the firmware level,
showing only a classic, flat topology to the OS.

That behavior can be seen on CentOS 7.3 Linux, having compiled x265 2.4+2 with GCC 4.8.5
and yasm 1.3.0. The machine is a HP ProLiant DL360 Gen9 machine with two Intel Xeon
E5-2620 CPUs.

Removing --pmode --pme was suggested by Mario *LigH* Rohkrämer earlier in this thread.

Here is my topology when NUMA is enabled (pretty simple):

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 32638 MB
node 0 free: 266 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 32768 MB
node 1 free: 82 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Thanks!

>> Ah yes, I've also found that 8K does indeed help a ton. With 4K and
>> similar settings, I'm
>> able to load 16-25 CPUs currently, sometimes briefly 30. With 8K, load is
>> much higher.
>>
>> Maybe you can advise how to maximize parallelization / loading as many
>> CPUs as possible
>> without breaking NUMA support on both Windows and Linux.
>>
>> I'm saying this, because my benchmarking project is targeting multiple
>> operating systems,
>> it currently works on:
>>   * Windows NT 5.2 & 6.0 (wo. NUMA)
>>   * Windows NT 6.1 - 10.0 (w. NUMA)
>>   * MacOS X (wo. NUMA)
>>   * Linux (w. and wo. NUMA)
>>   * FreeBSD, OpenBSD, NetBSD and DragonFly BSD UNIX (wo. NUMA)
>>   * Solaris (wo. NUMA)
>>   * Haiku OS (wo. NUMA)
>>
>> Thank you very much!
>>
>> Best,
>> Michael
>>
>> On 05/10/2017 07:21 AM, Pradeep Ramachandran wrote:
>>> Michael,
>>> Adding --lookahead-threads 2 statically allocated two threads for
>>> lookahead. Therefore, the worker threads launched to work on WPP will
>> 32-2
>>> = 30 in count. We've found some situations in which statically allocating
>>> threads for lookahead was useful and therefore decided to expose it to
>> the
>>> user. Please see if this helps your use-case and enable appropriately.
>>>
>>> Now as far as scaling up for 8K goes, a single instance of x265 scales up
>>> well to 25-30 threads depending on the preset you're running in. We've
>>> found pmode and pme help performance considerably on some Broadwell
>> server
>>> systems but again, that is also dependent on content. I would encourage
>> you
>>> play with those settings and see if they help your use case. Beyond these
>>> thread counts, one instance of x265 may not be beneficial for you.
>>>
>>> Pradeep.
>>>
>>> On Fri, May 5, 2017 at 3:26 PM, Michael Lackner <
>>> michael.lackner at unileoben.ac.at> wrote:
>>>
>>>> I found the reason for "why did x265 use 30 threads and not 32, when I
>>>> have 32 CPUs".
>>>>
>>>> Actually, it was (once again) my own fault. Thinking I know better than
>>>> x265, I spawned
>>>> two lookahead threads starting with 32 logical CPUs
>> ('--lookahead-threads
>>>> 2').
>>>>
>>>> It seems what x265 does is to reserve two dedicated CPUs for this, but
>>>> then it couldn't
>>>> permanently saturate them.
>>>>
>>>> I still don't know when I should be starting with that stuff for 8K
>>>> content. 64 CPUs? 256
>>>> CPUs? Or should I leave everything to x265? My goal was to be able to
>>>> fully load as many
>>>> CPUs as possible in the future.
>>>>
>>>> In any case, the culprit was myself.
>>>>
>>>> On 05/04/2017 11:18 AM, Mario *LigH* Rohkrämer wrote:
>>>>> Am 04.05.2017, 10:58 Uhr, schrieb Michael Lackner <
>>>> michael.lackner at unileoben.ac.at>:
>>>>>
>>>>>> Still wondering why not 32, but ok.
>>>>>
>>>>> x265 will calculate how many threads it will really need to utilize the
>>>> WPP and other
>>>>> parallelizable steps, in relation to the frame dimensions and the
>>>> complexity. It may not
>>>>> *need* more than 30 threads, would not have any task to give to two
>>>> more. Possibly.
>>>>> Developers know better...
>>>>
>>>> --
>>>> Michael Lackner
>>>> Lehrstuhl für Informationstechnologie (CiT)
>>>> Montanuniversität Leoben
>>>> Tel.: +43 (0)3842/402-1505 | Mail: michael.lackner at unileoben.ac.at
>>>> Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac.
>>>> at/infotech
>>>> _______________________________________________
>>>> x265-devel mailing list
>>>> x265-devel at videolan.org
>>>> https://mailman.videolan.org/listinfo/x265-devel
>>
>> --
>> Michael Lackner
>> Lehrstuhl für Informationstechnologie (CiT)
>> Montanuniversität Leoben
>> Tel.: +43 (0)3842/402-1505 | Mail: michael.lackner at unileoben.ac.at
>> Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac.
>> at/infotech
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>>
>>
>>
>> N�n�r����)em�h�yhiם�w^��

-- 
Michael Lackner
Lehrstuhl für Informationstechnologie (CiT)
Montanuniversität Leoben
Tel.: +43 (0)3842/402-1505 | Mail: michael.lackner at unileoben.ac.at
Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac.at/infotech


More information about the x265-devel mailing list