[x265] F.R.: Report number of frame threads in [info] block

Mario *LigH* Rohkrämer contact at ligh.de
Mon Nov 25 08:49:00 CET 2013


Thank you for this detailed insight. My trust in your competence will  
persist. ;)

It is certainly important to know that parallelizing has several "levels"  
(frame/slice/references) and depends on the video attributes (e.g. number  
of slices due to the height).

Seeing mainly 2/10 threads active happened only in the slowest presets.  
Average presets usually had 6/10 threads active, but still with lower  
overall CPU utilization; well possible that videos with small dimensions  
are encoded less efficiently, regarding thread sync overhead, somehow.


Am 24.11.2013, 23:04 Uhr, schrieb Steve Borho <steve at borho.org>:

>
> On Nov 23, 2013, at 1:06 PM, Mario Rohkrämer <contact at ligh.de> wrote:
>
>> Am 23.11.2013, 19:45 Uhr, schrieb Tom Vaughan  
>> <tom.vaughan at multicorewareinc.com>:
>>
>>> Mario,
>>> The number of concurrently encoded frames is already reported in the
>>> x265[info] output.
>>>
>>> Example:
>>> x265 [info]: WPP streams / pool / frames  : 17 / 32 / 1
>>
>> That makes me almost concerned...
>>
>> With a Phenom-II X6 (6 cores) I get e.g.: 5 / 6 / *2*
>>
>> So it encodes only 2 frames in parallel? Because there are other  
>> intense tasks utilizing other threads?
>>
>> According to ProcessExplorer, x265 runs 10 threads. Up to 6 of them are  
>> more or less busy. Sometimes only 2 of them, depending on the preset  
>> used. So it is probably correct, just possibly not yet "optimal".
>>
>
> The three numbers in that line describe all the parallelism variables.   
> The encoder is creating 6 worker threads, one for each CPU core.  The  
> worker threads encode a row of CTUs at a time (glossing over a few  
> details).  Your video is fairly small, only 5 rows of 64x64 blocks, so  
> there is not much parallelism there to be exposed to wave-front  
> analysis.  The 2 frame threads, by design, are mostly idle, they have  
> some setup work at the beginning of each frame and some entropy encode  
> work at the end of each frame, but for the bulk of the encode time they  
> are blocked waiting for reference frames to complete rows or for their  
> own rows to be completed.
>
> Adding more frame threads would not necessarily help much, since there  
> is a three-row lag between reference frames (deblock+sao+me-range), 5  
> rows does not give you much room for frame parallelism either.
>
> At that resolution, you would be better served with 32x32 blocks (--ctu  
> 32) if you need to keep more cores occupied.  You would get more  
> wave-tront parallelism and could probably bump to -F3 effectively.  You  
> will want to decrease the me-range to 28 (ctu size minus luma  
> half-filter) to keep the me-range from limiting frame parallelism.
>
> --
> Steve Borho


-- 
__________

Fun and success!
Mario *LigH* Rohkrämer
mailto:contact at ligh.de
 


More information about the x265-devel mailing list