[x265] Parallelization on "manycore" systems

Michael Lackner michael.lackner at unileoben.ac.at
Wed Feb 1 12:25:52 CET 2017


Greetings,

I have a question about parallelization in x265. I'm currently preparing a benchmarking
project based on x265 (a successor of a similar project using x264).

The x264 one created in 2010 was locked on a specific version/options and is now running
out of steam because it fails to fully utilize todays' larger processors (16 and more
logical CPUs).

I'm currently basing this new thing on 4K input content (either UHD or full 4096x2160,
unsure), and I'd like it to scale up to around 1000-2000 logical CPUs or more if possible
(fully loading them). This would also make it possible to load entire shared memory
clusters today.

I don't care about effective output quality that much, so parallelization is paramount.

I've seen that x265 has a few knobs you can turn manually to better utilize many cores,
but for my content I'm not sure when I should set which option to what value?! I don't
have test systems for this yet of course...

I've begun to write a script to determine logical CPU counts on Windows, Linux and
FreeBSD, I just need to know what to do with the following:

--slices <integer>
--lookahead-slices <0..16>
--lookahead-threads <integer>

I'm already using:

--ctu 16
--wpp
--pmode
--pme

In total, my current options are like this (I also want to be hard on the CPU per clock to
make the benchmark run long enough even with a small enough input file, but only where it
doesn't hurt parallelization):

-D 10 --fps 24000/1001 -p veryslow --pmode --pme --wpp --open-gop --ref 6 --bframes 16
--b-pyramid --weightb --max-merge 5 --b-intra --bitrate 10000 --rect --amp --aq-mode 2
--no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level
1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 16 --max-tu-size 32 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

These might not be good settings for my purpose, and some are redundant given the profile
I guess, which is why I'd like to ask here. I'm just unsure when I should start using more
lookahead slices. And then? Should I switch from lookahead slices to lookahead threads at
some point, or can both be used together!?

When should I start slicing up input frames, and what do I need to to consider given
proper values for --slices <integer> etc.

Is it even possible to scale to THAT many cores with 4K/UHD content?!

I wanna make this a bit more future-proof this time around...

Thanks a lot for your input!

Best,
Michael

-- 
Michael Lackner
Lehrstuhl für Informationstechnologie (CiT)
Montanuniversität Leoben
Tel.: +43 (0)3842/402-1505 | Mail: michael.lackner at unileoben.ac.at
Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac.at/infotech


More information about the x265-devel mailing list