<div dir="ltr"><span style="font-size:12.8000001907349px">--csv-log-level 1 is broken in 1.7 release, x265.exe: unrecognized option '</span><span style="font-size:12.8000001907349px">--csv-log-level', help!</span></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 3, 2015 at 11:53 PM, Steve Borho <span dir="ltr"><<a href="mailto:steve@borho.org" target="_blank">steve@borho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 08/03, Ximing Cheng wrote:<br>

> I found the lookahead JobProvider only process its tasks on the threadpool<br>

> zero (the first threadpool), this will destroy the load balance of the<br>

> muti-threadpool system as the frame encoders are distributed on<br>

> the muti-threadpool by round robin. The encoder could not fully use the CPU<br>

> resource as the HEVC algorithm's high correlation.<br>

> Some of the worker thread sometimes waiting for the Job to awaken them, but<br>

> lookahead could not awaken thread on the second threadpool. And the main<br>

> x265 encoder must wait for the output queue of the lookahead. If the<br>

> lookahead use the same round robin strategy to distribute different frames<br>

> on the muti-threadpool as the frame encoder, is it better for the muti-numa<br>

> system? Thanks!<br>

<br>

</span>Lookahead is generally not a bottleneck once it fills its output queue<br>

and the frame encoders start working. I added timers within the frame<br>

encoders which measure how much time the frame encoder sits idle,<br>

waiting for a slice decision from the lookahead. You can see this with<br>

--csv frames.csv --csv-log-level 1. After an encode, open frames.csv and<br>

look at the DecideWait (ms) column.<br>

<br>

You can verify this by encoding with --b-adapt 1, which vastly reduces<br>

the amount of work performed by the lookahead. The DecideWait times<br>

should reduce a bit, but I don't expect you'll see much improvement in<br>

total utilization.<br>

<br>

There are basically two reasons why HEVC has less parallelism than AVC<br>

(leading to less utilization). First is the large CTU size (64x64 vs<br>

16x16 macroblocks), reducing row granularity to one fourth. The second<br>

is the new SAO loop filter, which adds an extra row of reference lag. On<br>

the plus side we have WPP, which increases parallelism but often not<br>

enough to make up for the CTU size and SAO.<br>

<br>

> On Tue, Jul 28, 2015 at 1:57 PM, Mario *LigH* Rohkr??mer <<a href="mailto:contact@ligh.de">contact@ligh.de</a>><br>

<div class="HOEnZb"><div class="h5">> wrote:<br>

><br>

> > Hi Cheng.<br>

> ><br>

> > This issue has been discussed before in the VideoHelp forum.<br>

> > Parallelization is a bit more limited because dependencies between tasks in<br>

> > HEVC algorithms are possibly more restrictive than in AVC algorithms (many<br>

> > parts of the HEVC algorithm need to wait for others finishing intermediate<br>

> > results, and splitting the video frame across too many slices would hurt<br>

> > the encoding efficiency).<br>

> ><br>

> > But it is easily possible to run several applications in parallel so that<br>

> > they each get a share of available cores.<br>

> ><br>

> ><br>

> > Am 28.07.2015, 03:58 Uhr, schrieb Ximing Cheng <<a href="mailto:chengximing1989@gmail.com">chengximing1989@gmail.com</a><br>

> > >:<br>

> ><br>

> > Hi, I am testing x265 with a two numa nodes server, each node has 36 cores.<br>

> >> The x265 version is 1.7 release with command line<br>

> >><br>

> >> ./x265 --input-res 1920x1080 --input input.yuv --bitrate 1200<br>

> >> --vbv-maxrate<br>

> >> 1380 --fps 20 --early-skip --preset fast -o test1.hevc<br>

> >><br>

> >> but when ruuning on the server, CPU utilization ranges from 27% ~ 35% (<<br>

> >> 40%) which means most of the CPU cores are not busy.<br>

> >><br>

> >> x265 [info]: HEVC encoder version 1.7x265 [info]: build info<br>

> >> [Linux][GCC 4.4.6][64 bit] 8bppx265 [info]: using cpu capabilities:<br>

> >> MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2x265 [warning]:<br>

> >> --psnr used with AQ on: results will be invalid!x265 [warning]: --tune<br>

> >> psnr should be used if attempting to benchmark psnr!x265 [info]: Main<br>

> >> profile, Level-4 (Main tier)x265 [info]: Thread pool 0 using 36<br>

> >> threads on NUMA node 0x265 [info]: Thread pool 1 using 36 threads on<br>

> >> NUMA node 1x265 [info]: frame threads / pool features       : 16 /<br>

> >> wpp(34 rows)+pmodex265 [warning]: VBV maxrate specified, but no<br>

> >> bufsize, ignoredx265 [info]: Coding QT: max CU size, min CU size : 32<br>

> >> / 8x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2<br>

> >> intrax265 [info]: ME / range / subpel / merge         : star / 57 / 1<br>

> >> / 2x265 [info]: Keyframe min / max / scenecut       : 20 / 250 /<br>

> >> 40x265 [info]: Lookahead / bframes / badapt        : 60 / 4 / 2x265<br>

> >> [info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 1 / 1x265<br>

> >> [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.3 / 32 / 1x265<br>

> >> [info]: Rate Control / qCompress            : ABR-1200 kbps / 0.60x265<br>

> >> [info]: tools: rect amp rd=4 rdoq=2 early-skip signhide tmvp b-intra<br>

<br>

--<br>

</div></div><span class="HOEnZb"><font color="#888888">Steve Borho<br>

</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>

x265-devel mailing list<br>

<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>

<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>

</div></div></blockquote></div><br></div>