<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 10, 2017 at 11:12 AM, Michael Lackner <span dir="ltr"><<a href="mailto:michael.lackner@unileoben.ac.at" target="_blank">michael.lackner@unileoben.ac.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thank you very much for your input!<br>
<br>
Since --pmode and --pme seem to break NUMA support, I disabled them. I simply cannot tell<br>
users that they have to switch off NUMA in their UEFI firmware just for this one<br>
application. There may be a lot of situations where this is just not doable.<br>
<br>
If there is a way to make --pmode --pme work together with x265s' NUMA support, I'd use<br>
it, but I don't know how?<br></blockquote><div><br></div><div>Could you please elaborate more here? It seems to work ok for us here. I've tried on CentOS and Win Server 2017 dual socket systems and I see all sockets being used.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Ah yes, I've also found that 8K does indeed help a ton. With 4K and similar settings, I'm<br>
able to load 16-25 CPUs currently, sometimes briefly 30. With 8K, load is much higher.<br>
<br>
Maybe you can advise how to maximize parallelization / loading as many CPUs as possible<br>
without breaking NUMA support on both Windows and Linux.<br>
<br>
I'm saying this, because my benchmarking project is targeting multiple operating systems,<br>
it currently works on:<br>
* Windows NT 5.2 & 6.0 (wo. NUMA)<br>
* Windows NT 6.1 - 10.0 (w. NUMA)<br>
* MacOS X (wo. NUMA)<br>
* Linux (w. and wo. NUMA)<br>
* FreeBSD, OpenBSD, NetBSD and DragonFly BSD UNIX (wo. NUMA)<br>
* Solaris (wo. NUMA)<br>
* Haiku OS (wo. NUMA)<br>
<br>
Thank you very much!<br>
<br>
Best,<br>
Michael<br>
<div class="HOEnZb"><div class="h5"><br>
On 05/10/2017 07:21 AM, Pradeep Ramachandran wrote:<br>
> Michael,<br>
> Adding --lookahead-threads 2 statically allocated two threads for<br>
> lookahead. Therefore, the worker threads launched to work on WPP will 32-2<br>
> = 30 in count. We've found some situations in which statically allocating<br>
> threads for lookahead was useful and therefore decided to expose it to the<br>
> user. Please see if this helps your use-case and enable appropriately.<br>
><br>
> Now as far as scaling up for 8K goes, a single instance of x265 scales up<br>
> well to 25-30 threads depending on the preset you're running in. We've<br>
> found pmode and pme help performance considerably on some Broadwell server<br>
> systems but again, that is also dependent on content. I would encourage you<br>
> play with those settings and see if they help your use case. Beyond these<br>
> thread counts, one instance of x265 may not be beneficial for you.<br>
><br>
> Pradeep.<br>
><br>
> On Fri, May 5, 2017 at 3:26 PM, Michael Lackner <<br>
> <a href="mailto:michael.lackner@unileoben.ac.at">michael.lackner@unileoben.ac.<wbr>at</a>> wrote:<br>
><br>
>> I found the reason for "why did x265 use 30 threads and not 32, when I<br>
>> have 32 CPUs".<br>
>><br>
>> Actually, it was (once again) my own fault. Thinking I know better than<br>
>> x265, I spawned<br>
>> two lookahead threads starting with 32 logical CPUs ('--lookahead-threads<br>
>> 2').<br>
>><br>
>> It seems what x265 does is to reserve two dedicated CPUs for this, but<br>
>> then it couldn't<br>
>> permanently saturate them.<br>
>><br>
>> I still don't know when I should be starting with that stuff for 8K<br>
>> content. 64 CPUs? 256<br>
>> CPUs? Or should I leave everything to x265? My goal was to be able to<br>
>> fully load as many<br>
>> CPUs as possible in the future.<br>
>><br>
>> In any case, the culprit was myself.<br>
>><br>
>> On 05/04/2017 11:18 AM, Mario *LigH* Rohkrämer wrote:<br>
>>> Am 04.05.2017, 10:58 Uhr, schrieb Michael Lackner <<br>
>> <a href="mailto:michael.lackner@unileoben.ac.at">michael.lackner@unileoben.ac.<wbr>at</a>>:<br>
>>><br>
>>>> Still wondering why not 32, but ok.<br>
>>><br>
>>> x265 will calculate how many threads it will really need to utilize the<br>
>> WPP and other<br>
>>> parallelizable steps, in relation to the frame dimensions and the<br>
>> complexity. It may not<br>
>>> *need* more than 30 threads, would not have any task to give to two<br>
>> more. Possibly.<br>
>>> Developers know better...<br>
>><br>
>> --<br>
>> Michael Lackner<br>
>> Lehrstuhl für Informationstechnologie (CiT)<br>
>> Montanuniversität Leoben<br>
>> Tel.: +43 (0)3842/402-1505 | Mail: <a href="mailto:michael.lackner@unileoben.ac.at">michael.lackner@unileoben.ac.<wbr>at</a><br>
>> Fax.: +43 (0)3842/402-1502 | Web: <a href="http://institute.unileoben.ac" rel="noreferrer" target="_blank">http://institute.unileoben.ac</a>.<br>
>> at/infotech<br>
>> ______________________________<wbr>_________________<br>
>> x265-devel mailing list<br>
>> <a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
>> <a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/<wbr>listinfo/x265-devel</a><br>
<br>
--<br>
Michael Lackner<br>
Lehrstuhl für Informationstechnologie (CiT)<br>
Montanuniversität Leoben<br>
Tel.: +43 (0)3842/402-1505 | Mail: <a href="mailto:michael.lackner@unileoben.ac.at">michael.lackner@unileoben.ac.<wbr>at</a><br>
Fax.: +43 (0)3842/402-1502 | Web: <a href="http://institute.unileoben.ac.at/infotech" rel="noreferrer" target="_blank">http://institute.unileoben.ac.<wbr>at/infotech</a><br>
______________________________<wbr>_________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/<wbr>listinfo/x265-devel</a><br>
</div></div></blockquote></div><br></div></div>