[x265] F.R.: Report number of frame threads in [info] block
Mario *LigH* Rohkrämer
contact at ligh.de
Mon Nov 25 08:49:00 CET 2013
Thank you for this detailed insight. My trust in your competence will
persist. ;)
It is certainly important to know that parallelizing has several "levels"
(frame/slice/references) and depends on the video attributes (e.g. number
of slices due to the height).
Seeing mainly 2/10 threads active happened only in the slowest presets.
Average presets usually had 6/10 threads active, but still with lower
overall CPU utilization; well possible that videos with small dimensions
are encoded less efficiently, regarding thread sync overhead, somehow.
Am 24.11.2013, 23:04 Uhr, schrieb Steve Borho <steve at borho.org>:
>
> On Nov 23, 2013, at 1:06 PM, Mario Rohkrämer <contact at ligh.de> wrote:
>
>> Am 23.11.2013, 19:45 Uhr, schrieb Tom Vaughan
>> <tom.vaughan at multicorewareinc.com>:
>>
>>> Mario,
>>> The number of concurrently encoded frames is already reported in the
>>> x265[info] output.
>>>
>>> Example:
>>> x265 [info]: WPP streams / pool / frames : 17 / 32 / 1
>>
>> That makes me almost concerned...
>>
>> With a Phenom-II X6 (6 cores) I get e.g.: 5 / 6 / *2*
>>
>> So it encodes only 2 frames in parallel? Because there are other
>> intense tasks utilizing other threads?
>>
>> According to ProcessExplorer, x265 runs 10 threads. Up to 6 of them are
>> more or less busy. Sometimes only 2 of them, depending on the preset
>> used. So it is probably correct, just possibly not yet "optimal".
>>
>
> The three numbers in that line describe all the parallelism variables.
> The encoder is creating 6 worker threads, one for each CPU core. The
> worker threads encode a row of CTUs at a time (glossing over a few
> details). Your video is fairly small, only 5 rows of 64x64 blocks, so
> there is not much parallelism there to be exposed to wave-front
> analysis. The 2 frame threads, by design, are mostly idle, they have
> some setup work at the beginning of each frame and some entropy encode
> work at the end of each frame, but for the bulk of the encode time they
> are blocked waiting for reference frames to complete rows or for their
> own rows to be completed.
>
> Adding more frame threads would not necessarily help much, since there
> is a three-row lag between reference frames (deblock+sao+me-range), 5
> rows does not give you much room for frame parallelism either.
>
> At that resolution, you would be better served with 32x32 blocks (--ctu
> 32) if you need to keep more cores occupied. You would get more
> wave-tront parallelism and could probably bump to -F3 effectively. You
> will want to decrease the me-range to 28 (ctu size minus luma
> half-filter) to keep the me-range from limiting frame parallelism.
>
> --
> Steve Borho
--
__________
Fun and success!
Mario *LigH* Rohkrämer
mailto:contact at ligh.de
More information about the x265-devel
mailing list