<div dir="ltr">This patch no longer applies at the tip as the doc was recently changed. Will send an update - please ignore until then.<br><div>Pradeep.</div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">On Tue, Nov 3, 2015 at 9:32 PM, Pradeep Ramachandran <span dir="ltr"><<a href="mailto:pradeep@multicorewareinc.com" target="_blank">pradeep@multicorewareinc.com</a>></span> wrote:<br></div></div></div></div></div></div></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>
# User Pradeep Ramachandran <<a href="mailto:pradeep@multicorewareinc.com">pradeep@multicorewareinc.com</a>><br>
# Date 1446566445 -19800<br>
# Tue Nov 03 21:30:45 2015 +0530<br>
# Node ID cdc2b132a66f97bab510313f614909df3089c2c7<br>
# Parent 61396ea8096a9f75667ac01ae8b4bf02169d3b64<br>
perf: Enabling lookahead-slices for all presets except veryslow & placebo<br>
<br>
Seeing ~10% performance on the faster presets on skylake, and ~2X performance<br>
on Xeon systems in ultrafast setting. Performance improvement is through a<br>
considerable increase in utilization across the board. Disabling slicing for<br>
videos of resoultion < 720p to limit impact on quality.<br>
<br>
Commit will change outputs, but reduction in quality (measured by PSNR or SSIM)<br>
is <0.01% across a wide variety of presets and runs.<br>
<br>
diff -r 61396ea8096a -r cdc2b132a66f doc/reST/cli.rst<br>
--- a/doc/reST/cli.rst Mon Oct 12 10:23:37 2015 +0800<br>
+++ b/doc/reST/cli.rst Tue Nov 03 21:30:45 2015 +0530<br>
@@ -1124,21 +1124,31 @@<br>
<br>
.. option:: --lookahead-slices <0..16><br>
<br>
- Use multiple worker threads to measure the estimated cost of each<br>
- frame within the lookahead. When :option:`--b-adapt` is 2, most<br>
- frame cost estimates will be performed in batch mode, many cost<br>
- estimates at the same time, and lookahead-slices is ignored for<br>
- batched estimates. The effect on performance can be quite small.<br>
- The higher this parameter, the less accurate the frame costs will be<br>
- (since context is lost across slice boundaries) which will result in<br>
- less accurate B-frame and scene-cut decisions.<br>
+ Use multiple worker threads to measure the estimated cost of each frame<br>
+ within the lookahead. The frame is divided into the specified number of<br>
+ slices, and one-thread is launched per slice. When :option:`--b-adapt` is<br>
+ 2, most frame cost estimates will be performed in batch mode (many cost<br>
+ estimates at the same time) and lookahead-slices is ignored for batched<br>
+ estimates; it may still be used for single cost estimations. The higher this<br>
+ parameter, the less accurate the frame costs will be (since context is lost<br>
+ across slice boundaries) which will result in less accurate B-frame and<br>
+ scene-cut decisions. The effect on performance can be significant especially<br>
+ on systems with many threads.<br>
<br>
- The encoder may internally lower the number of slices to ensure<br>
- each slice codes at least 10 16x16 rows of lowres blocks. If slices<br>
- are used in lookahead, they are logged in the list of tools as<br>
- *lslices*.<br>
-<br>
- **Values:** 0 - disabled (default). 1 is the same as 0. Max 16<br>
+ The encoder may internally lower the number of slices or disable<br>
+ slicing to ensure each slice codes at least 10 16x16 rows of lowres<br>
+ blocks to minimize the impact on quality. For example, for 720p and<br>
+ 1080p videos, the number of slices is capped to 4 and 6, respectively.<br>
+ For resolutions lesser than 720p, slicing is auto-disabled.<br>
+<br>
+ If slices are used in lookahead, they are logged in the list of tools<br>
+ as *lslices*<br>
+<br>
+ **Values:** 0 - disabled. 1 is the same as 0. Max 16.<br>
+ Default: 8 for ultrafast, superfast, faster, fast, medium<br>
+ 4 for slow, slower<br>
+ disabled for veryslow, slower<br>
+<br>
<br>
.. option:: --b-adapt <integer><br>
<br>
diff -r 61396ea8096a -r cdc2b132a66f doc/reST/presets.rst<br>
--- a/doc/reST/presets.rst Mon Oct 12 10:23:37 2015 +0800<br>
+++ b/doc/reST/presets.rst Tue Nov 03 21:30:45 2015 +0530<br>
@@ -19,61 +19,63 @@<br>
<br>
The presets adjust encoder parameters to affect these trade-offs.<br>
<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |<br>
-+==============+===========+===========+==========+========+======+========+======+========+==========+=========+<br>
-| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| me | dia | hex | hex | hex | hex | hex | star | star | star | star |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
-| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |<br>
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |<br>
++=================+===========+===========+==========+========+======+========+======+========+==========+=========+<br>
+| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| me | dia | hex | hex | hex | hex | hex | star | star | star | star |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
+| lookahead-slices | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | 0 | 0 |<br>
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+<br>
<br>
Placebo mode enables transform-skip prediction evaluation.<br>
<br>
diff -r 61396ea8096a -r cdc2b132a66f source/common/param.cpp<br>
--- a/source/common/param.cpp Mon Oct 12 10:23:37 2015 +0800<br>
+++ b/source/common/param.cpp Tue Nov 03 21:30:45 2015 +0530<br>
@@ -147,7 +147,7 @@<br>
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;<br>
param->bBPyramid = 1;<br>
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */<br>
- param->lookaheadSlices = 0;<br>
+ param->lookaheadSlices = 8;<br>
<br>
/* Intra Coding Tools */<br>
param->bEnableConstrainedIntra = 0;<br>
@@ -347,6 +347,7 @@<br>
param->subpelRefine = 3;<br>
param->maxNumMergeCand = 3;<br>
param->searchMethod = X265_STAR_SEARCH;<br>
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists<br>
}<br>
else if (!strcmp(preset, "slower"))<br>
{<br>
@@ -364,6 +365,7 @@<br>
param->maxNumMergeCand = 3;<br>
param->searchMethod = X265_STAR_SEARCH;<br>
param->bIntraInBFrames = 1;<br>
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists<br>
}<br>
else if (!strcmp(preset, "veryslow"))<br>
{<br>
@@ -382,6 +384,7 @@<br>
param->searchMethod = X265_STAR_SEARCH;<br>
param->maxNumReferences = 5;<br>
param->bIntraInBFrames = 1;<br>
+ param->lookaheadSlices = 0; // disabled for best quality<br>
}<br>
else if (!strcmp(preset, "placebo"))<br>
{<br>
@@ -403,6 +406,7 @@<br>
param->maxNumReferences = 5;<br>
param->rc.bEnableSlowFirstPass = 1;<br>
param->bIntraInBFrames = 1;<br>
+ param->lookaheadSlices = 0; // disabled for best quality<br>
// TODO: optimized esa<br>
}<br>
else<br>
</blockquote></div><br></div></div>