<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Nov 23, 2013, at 1:06 PM, Mario Rohkrämer <<a href="mailto:contact@ligh.de">contact@ligh.de</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Am 23.11.2013, 19:45 Uhr, schrieb Tom Vaughan <<a href="mailto:tom.vaughan@multicorewareinc.com">tom.vaughan@multicorewareinc.com</a>>:<br><br><blockquote type="cite">Mario,<br>The number of concurrently encoded frames is already reported in the<br>x265[info] output.<br><br>Example:<br>x265 [info]: WPP streams / pool / frames : 17 / 32 / 1<br></blockquote><br>That makes me almost concerned...<br><br>With a Phenom-II X6 (6 cores) I get e.g.: 5 / 6 / *2*<br><br>So it encodes only 2 frames in parallel? Because there are other intense tasks utilizing other threads?<br><br>According to ProcessExplorer, x265 runs 10 threads. Up to 6 of them are more or less busy. Sometimes only 2 of them, depending on the preset used. So it is probably correct, just possibly not yet "optimal".<br><br></div></blockquote><div><br></div><div>The three numbers in that line describe all the parallelism variables. The encoder is creating 6 worker threads, one for each CPU core. The worker threads encode a row of CTUs at a time (glossing over a few details). Your video is fairly small, only 5 rows of 64x64 blocks, so there is not much parallelism there to be exposed to wave-front analysis. The 2 frame threads, by design, are mostly idle, they have some setup work at the beginning of each frame and some entropy encode work at the end of each frame, but for the bulk of the encode time they are blocked waiting for reference frames to complete rows or for their own rows to be completed.</div><div><br></div><div>Adding more frame threads would not necessarily help much, since there is a three-row lag between reference frames (deblock+sao+me-range), 5 rows does not give you much room for frame parallelism either.</div><div><br></div><div>At that resolution, you would be better served with 32x32 blocks (--ctu 32) if you need to keep more cores occupied. You would get more wave-tront parallelism and could probably bump to -F3 effectively. You will want to decrease the me-range to 28 (ctu size minus luma half-filter) to keep the me-range from limiting frame parallelism.</div><div><br></div><div>--</div><div>Steve Borho</div></div></body></html>