[x265] x265 CPU utilization very low on a multi-numa sockets server

Ximing Cheng chengximing1989 at gmail.com
Fri Aug 7 05:43:50 CEST 2015


--csv-log-level 1 is broken in 1.7 release, x265.exe: unrecognized
option '--csv-log-level',
help!

On Mon, Aug 3, 2015 at 11:53 PM, Steve Borho <steve at borho.org> wrote:

> On 08/03, Ximing Cheng wrote:
> > I found the lookahead JobProvider only process its tasks on the
> threadpool
> > zero (the first threadpool), this will destroy the load balance of the
> > muti-threadpool system as the frame encoders are distributed on
> > the muti-threadpool by round robin. The encoder could not fully use the
> CPU
> > resource as the HEVC algorithm's high correlation.
> > Some of the worker thread sometimes waiting for the Job to awaken them,
> but
> > lookahead could not awaken thread on the second threadpool. And the main
> > x265 encoder must wait for the output queue of the lookahead. If the
> > lookahead use the same round robin strategy to distribute different
> frames
> > on the muti-threadpool as the frame encoder, is it better for the
> muti-numa
> > system? Thanks!
>
> Lookahead is generally not a bottleneck once it fills its output queue
> and the frame encoders start working. I added timers within the frame
> encoders which measure how much time the frame encoder sits idle,
> waiting for a slice decision from the lookahead. You can see this with
> --csv frames.csv --csv-log-level 1. After an encode, open frames.csv and
> look at the DecideWait (ms) column.
>
> You can verify this by encoding with --b-adapt 1, which vastly reduces
> the amount of work performed by the lookahead. The DecideWait times
> should reduce a bit, but I don't expect you'll see much improvement in
> total utilization.
>
> There are basically two reasons why HEVC has less parallelism than AVC
> (leading to less utilization). First is the large CTU size (64x64 vs
> 16x16 macroblocks), reducing row granularity to one fourth. The second
> is the new SAO loop filter, which adds an extra row of reference lag. On
> the plus side we have WPP, which increases parallelism but often not
> enough to make up for the CTU size and SAO.
>
> > On Tue, Jul 28, 2015 at 1:57 PM, Mario *LigH* Rohkr??mer <
> contact at ligh.de>
> > wrote:
> >
> > > Hi Cheng.
> > >
> > > This issue has been discussed before in the VideoHelp forum.
> > > Parallelization is a bit more limited because dependencies between
> tasks in
> > > HEVC algorithms are possibly more restrictive than in AVC algorithms
> (many
> > > parts of the HEVC algorithm need to wait for others finishing
> intermediate
> > > results, and splitting the video frame across too many slices would
> hurt
> > > the encoding efficiency).
> > >
> > > But it is easily possible to run several applications in parallel so
> that
> > > they each get a share of available cores.
> > >
> > >
> > > Am 28.07.2015, 03:58 Uhr, schrieb Ximing Cheng <
> chengximing1989 at gmail.com
> > > >:
> > >
> > > Hi, I am testing x265 with a two numa nodes server, each node has 36
> cores.
> > >> The x265 version is 1.7 release with command line
> > >>
> > >> ./x265 --input-res 1920x1080 --input input.yuv --bitrate 1200
> > >> --vbv-maxrate
> > >> 1380 --fps 20 --early-skip --preset fast -o test1.hevc
> > >>
> > >> but when ruuning on the server, CPU utilization ranges from 27% ~ 35%
> (<
> > >> 40%) which means most of the CPU cores are not busy.
> > >>
> > >> x265 [info]: HEVC encoder version 1.7x265 [info]: build info
> > >> [Linux][GCC 4.4.6][64 bit] 8bppx265 [info]: using cpu capabilities:
> > >> MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2x265 [warning]:
> > >> --psnr used with AQ on: results will be invalid!x265 [warning]: --tune
> > >> psnr should be used if attempting to benchmark psnr!x265 [info]: Main
> > >> profile, Level-4 (Main tier)x265 [info]: Thread pool 0 using 36
> > >> threads on NUMA node 0x265 [info]: Thread pool 1 using 36 threads on
> > >> NUMA node 1x265 [info]: frame threads / pool features       : 16 /
> > >> wpp(34 rows)+pmodex265 [warning]: VBV maxrate specified, but no
> > >> bufsize, ignoredx265 [info]: Coding QT: max CU size, min CU size : 32
> > >> / 8x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2
> > >> intrax265 [info]: ME / range / subpel / merge         : star / 57 / 1
> > >> / 2x265 [info]: Keyframe min / max / scenecut       : 20 / 250 /
> > >> 40x265 [info]: Lookahead / bframes / badapt        : 60 / 4 / 2x265
> > >> [info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 1 / 1x265
> > >> [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.3 / 32 / 1x265
> > >> [info]: Rate Control / qCompress            : ABR-1200 kbps / 0.60x265
> > >> [info]: tools: rect amp rd=4 rdoq=2 early-skip signhide tmvp b-intra
>
> --
> Steve Borho
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20150807/a9c27ac2/attachment-0001.html>


More information about the x265-devel mailing list