[x265] x265 CPU utilization very low on a multi-numa sockets server

Steve Borho steve at borho.org
Mon Aug 3 17:53:24 CEST 2015


On 08/03, Ximing Cheng wrote:
> I found the lookahead JobProvider only process its tasks on the threadpool
> zero (the first threadpool), this will destroy the load balance of the
> muti-threadpool system as the frame encoders are distributed on
> the muti-threadpool by round robin. The encoder could not fully use the CPU
> resource as the HEVC algorithm's high correlation.
> Some of the worker thread sometimes waiting for the Job to awaken them, but
> lookahead could not awaken thread on the second threadpool. And the main
> x265 encoder must wait for the output queue of the lookahead. If the
> lookahead use the same round robin strategy to distribute different frames
> on the muti-threadpool as the frame encoder, is it better for the muti-numa
> system? Thanks!

Lookahead is generally not a bottleneck once it fills its output queue
and the frame encoders start working. I added timers within the frame
encoders which measure how much time the frame encoder sits idle,
waiting for a slice decision from the lookahead. You can see this with
--csv frames.csv --csv-log-level 1. After an encode, open frames.csv and
look at the DecideWait (ms) column.

You can verify this by encoding with --b-adapt 1, which vastly reduces
the amount of work performed by the lookahead. The DecideWait times
should reduce a bit, but I don't expect you'll see much improvement in
total utilization.

There are basically two reasons why HEVC has less parallelism than AVC
(leading to less utilization). First is the large CTU size (64x64 vs
16x16 macroblocks), reducing row granularity to one fourth. The second
is the new SAO loop filter, which adds an extra row of reference lag. On
the plus side we have WPP, which increases parallelism but often not
enough to make up for the CTU size and SAO.

> On Tue, Jul 28, 2015 at 1:57 PM, Mario *LigH* Rohkr??mer <contact at ligh.de>
> wrote:
> 
> > Hi Cheng.
> >
> > This issue has been discussed before in the VideoHelp forum.
> > Parallelization is a bit more limited because dependencies between tasks in
> > HEVC algorithms are possibly more restrictive than in AVC algorithms (many
> > parts of the HEVC algorithm need to wait for others finishing intermediate
> > results, and splitting the video frame across too many slices would hurt
> > the encoding efficiency).
> >
> > But it is easily possible to run several applications in parallel so that
> > they each get a share of available cores.
> >
> >
> > Am 28.07.2015, 03:58 Uhr, schrieb Ximing Cheng <chengximing1989 at gmail.com
> > >:
> >
> > Hi, I am testing x265 with a two numa nodes server, each node has 36 cores.
> >> The x265 version is 1.7 release with command line
> >>
> >> ./x265 --input-res 1920x1080 --input input.yuv --bitrate 1200
> >> --vbv-maxrate
> >> 1380 --fps 20 --early-skip --preset fast -o test1.hevc
> >>
> >> but when ruuning on the server, CPU utilization ranges from 27% ~ 35% (<
> >> 40%) which means most of the CPU cores are not busy.
> >>
> >> x265 [info]: HEVC encoder version 1.7x265 [info]: build info
> >> [Linux][GCC 4.4.6][64 bit] 8bppx265 [info]: using cpu capabilities:
> >> MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2x265 [warning]:
> >> --psnr used with AQ on: results will be invalid!x265 [warning]: --tune
> >> psnr should be used if attempting to benchmark psnr!x265 [info]: Main
> >> profile, Level-4 (Main tier)x265 [info]: Thread pool 0 using 36
> >> threads on NUMA node 0x265 [info]: Thread pool 1 using 36 threads on
> >> NUMA node 1x265 [info]: frame threads / pool features       : 16 /
> >> wpp(34 rows)+pmodex265 [warning]: VBV maxrate specified, but no
> >> bufsize, ignoredx265 [info]: Coding QT: max CU size, min CU size : 32
> >> / 8x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2
> >> intrax265 [info]: ME / range / subpel / merge         : star / 57 / 1
> >> / 2x265 [info]: Keyframe min / max / scenecut       : 20 / 250 /
> >> 40x265 [info]: Lookahead / bframes / badapt        : 60 / 4 / 2x265
> >> [info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 1 / 1x265
> >> [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.3 / 32 / 1x265
> >> [info]: Rate Control / qCompress            : ABR-1200 kbps / 0.60x265
> >> [info]: tools: rect amp rd=4 rdoq=2 early-skip signhide tmvp b-intra

-- 
Steve Borho


More information about the x265-devel mailing list