<div dir="ltr"><div><div><div><div>Hello,<br><br></div>Scheduling the lookahead thread to threadpool using round robin allocation has been on the task list for some time, but it's doubtful how much extra performance it would fetch (there's a case to be made for odd count frame encoders, though). Are you using -F16 - that sounds way too high for any real performance benefits. <br><br></div>We're working on some more numa optimizations, specifically forcing recon and reference frame allocation onto specific nodes. <br><br></div>Thanks,<br></div>Deepthi<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 3, 2015 at 7:42 AM, Ximing Cheng <span dir="ltr"><<a href="mailto:chengximing1989@gmail.com" target="_blank">chengximing1989@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I found the lookahead JobProvider only process its tasks on the threadpool zero (the first threadpool), this will destroy the load balance of the muti-threadpool system as the frame encoders are distributed on the muti-threadpool by round robin. The encoder could not fully use the CPU resource as the HEVC <span style="font-size:12.8000001907349px">algorithm's high correlation. </span><div><span style="font-size:12.8000001907349px">Some of the worker thread sometimes waiting for the Job to awaken them, but lookahead could not awaken thread on the second threadpool. And the main x265 encoder must wait for the output queue of the lookahead. If the lookahead use the same </span>round robin <span style="color:rgb(51,51,51);font-family:arial;font-size:13px;line-height:20.0200004577637px">strategy to </span>distribute different frames on the muti-threadpool as the frame encoder, is it better for the muti-numa system? Thanks!</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 28, 2015 at 1:57 PM, Mario *LigH* Rohkrämer <span dir="ltr"><<a href="mailto:contact@ligh.de" target="_blank">contact@ligh.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Cheng.<br>
<br>
This issue has been discussed before in the VideoHelp forum. Parallelization is a bit more limited because dependencies between tasks in HEVC algorithms are possibly more restrictive than in AVC algorithms (many parts of the HEVC algorithm need to wait for others finishing intermediate results, and splitting the video frame across too many slices would hurt the encoding efficiency).<br>
<br>
But it is easily possible to run several applications in parallel so that they each get a share of available cores.<span><br>
<br>
<br>
Am 28.07.2015, 03:58 Uhr, schrieb Ximing Cheng <<a href="mailto:chengximing1989@gmail.com" target="_blank">chengximing1989@gmail.com</a>>:<br>
<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>
Hi, I am testing x265 with a two numa nodes server, each node has 36 cores.<br>
The x265 version is 1.7 release with command line<br>
<br>
./x265 --input-res 1920x1080 --input input.yuv --bitrate 1200 --vbv-maxrate<br>
1380 --fps 20 --early-skip --preset fast -o test1.hevc<br>
<br>
but when ruuning on the server, CPU utilization ranges from 27% ~ 35% (<<br>
40%) which means most of the CPU cores are not busy.<br>
<br></span>
x265 [info]: HEVC encoder version 1.7x265 [info]: build info<br>
[Linux][GCC 4.4.6][64 bit] 8bppx265 [info]: using cpu capabilities:<br>
MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2x265 [warning]:<br>
--psnr used with AQ on: results will be invalid!x265 [warning]: --tune<br>
psnr should be used if attempting to benchmark psnr!x265 [info]: Main<br>
profile, Level-4 (Main tier)x265 [info]: Thread pool 0 using 36<br>
threads on NUMA node 0x265 [info]: Thread pool 1 using 36 threads on<br>
NUMA node 1x265 [info]: frame threads / pool features : 16 /<br>
wpp(34 rows)+pmodex265 [warning]: VBV maxrate specified, but no<br>
bufsize, ignoredx265 [info]: Coding QT: max CU size, min CU size : 32<br>
/ 8x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2<br>
intrax265 [info]: ME / range / subpel / merge : star / 57 / 1<br>
/ 2x265 [info]: Keyframe min / max / scenecut : 20 / 250 /<br>
40x265 [info]: Lookahead / bframes / badapt : 60 / 4 / 2x265<span><br>
[info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 1 / 1x265<br>
[info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.3 / 32 / 1x265<br>
[info]: Rate Control / qCompress : ABR-1200 kbps / 0.60x265<br>
[info]: tools: rect amp rd=4 rdoq=2 early-skip signhide tmvp b-intra<br>
</span></blockquote><span><font color="#888888">
<br>
<br>
-- <br>
<br>
Fun and success!<br>
Mario *LigH* Rohkrämer<br>
mailto:<a href="mailto:contact@ligh.de" target="_blank">contact@ligh.de</a></font></span><div><div><br>
<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</div></div></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<br></blockquote></div><br></div>