[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo

Pradeep Ramachandran pradeep at multicorewareinc.com
Wed Nov 4 07:42:54 CET 2015


This patch no longer applies at the tip as the doc was recently changed.
Will send an update - please ignore until then.
Pradeep.

On Tue, Nov 3, 2015 at 9:32 PM, Pradeep Ramachandran <
pradeep at multicorewareinc.com> wrote:

> # HG changeset patch
> # User Pradeep Ramachandran <pradeep at multicorewareinc.com>
> # Date 1446566445 -19800
> #      Tue Nov 03 21:30:45 2015 +0530
> # Node ID cdc2b132a66f97bab510313f614909df3089c2c7
> # Parent  61396ea8096a9f75667ac01ae8b4bf02169d3b64
> perf: Enabling lookahead-slices for all presets except veryslow & placebo
>
> Seeing ~10% performance on the faster presets on skylake, and ~2X
> performance
> on Xeon systems in ultrafast setting. Performance improvement is through a
> considerable increase in utilization across the board. Disabling slicing
> for
> videos of resoultion < 720p to limit impact on quality.
>
> Commit will change outputs, but reduction in quality (measured by PSNR or
> SSIM)
> is <0.01% across a wide variety of presets and runs.
>
> diff -r 61396ea8096a -r cdc2b132a66f doc/reST/cli.rst
> --- a/doc/reST/cli.rst  Mon Oct 12 10:23:37 2015 +0800
> +++ b/doc/reST/cli.rst  Tue Nov 03 21:30:45 2015 +0530
> @@ -1124,21 +1124,31 @@
>
>  .. option:: --lookahead-slices <0..16>
>
> -       Use multiple worker threads to measure the estimated cost of each
> -       frame within the lookahead. When :option:`--b-adapt` is 2, most
> -       frame cost estimates will be performed in batch mode, many cost
> -       estimates at the same time, and lookahead-slices is ignored for
> -       batched estimates. The effect on performance can be quite small.
> -       The higher this parameter, the less accurate the frame costs will
> be
> -       (since context is lost across slice boundaries) which will result
> in
> -       less accurate B-frame and scene-cut decisions.
> +       Use multiple worker threads to measure the estimated cost of each
> frame
> +       within the lookahead. The frame is divided into the specified
> number of
> +       slices, and one-thread is launched  per slice. When
> :option:`--b-adapt` is
> +       2, most frame cost estimates will be performed in batch mode (many
> cost
> +       estimates at the same time) and lookahead-slices is ignored for
> batched
> +       estimates; it may still be used for single cost estimations. The
> higher this
> +       parameter, the less accurate the frame costs will be (since
> context is lost
> +       across slice boundaries) which will result in less accurate
> B-frame and
> +       scene-cut decisions. The effect on performance can be significant
> especially
> +       on systems with many threads.
>
> -       The encoder may internally lower the number of slices to ensure
> -       each slice codes at least 10 16x16 rows of lowres blocks. If slices
> -       are used in lookahead, they are logged in the list of tools as
> -       *lslices*.
> -
> -       **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
> +       The encoder may internally lower the number of slices or disable
> +    slicing to ensure each slice codes at least 10 16x16 rows of lowres
> +    blocks to minimize the impact on quality. For example, for 720p and
> +    1080p videos, the number of slices is capped to 4 and 6, respectively.
> +    For resolutions lesser than 720p, slicing is auto-disabled.
> +
> +    If slices are used in lookahead, they are logged in the list of tools
> +    as *lslices*
> +
> +       **Values:** 0 - disabled. 1 is the same as 0. Max 16.
> +    Default: 8 for ultrafast, superfast, faster, fast, medium
> +             4 for slow, slower
> +             disabled for veryslow, slower
> +
>
>  .. option:: --b-adapt <integer>
>
> diff -r 61396ea8096a -r cdc2b132a66f doc/reST/presets.rst
> --- a/doc/reST/presets.rst      Mon Oct 12 10:23:37 2015 +0800
> +++ b/doc/reST/presets.rst      Tue Nov 03 21:30:45 2015 +0530
> @@ -19,61 +19,63 @@
>
>  The presets adjust encoder parameters to affect these trade-offs.
>
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -|              | ultrafast | superfast | veryfast | faster | fast |
> medium | slow | slower | veryslow | placebo |
>
> -+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
> -| ctu          |   32      |    32     |   32     |  64    |  64  |   64
>  |  64  |  64    |   64     |   64    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| min-cu-size  |   16      |     8     |    8     |   8    |   8  |    8
>  |   8  |   8    |    8     |    8    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| bframes      |    3      |     3     |    4     |   4    |  4   |    4
>  |  4   |   8    |    8     |    8    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| b-adapt      |    0      |     0     |    0     |   0    |  0   |    2
>  |  2   |   2    |    2     |    2    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rc-lookahead |    5      |    10     |   15     |  15    |  15  |   20
>  |  25  |   30   |   40     |   60    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| scenecut     |    0      |    40     |   40     |  40    |  40  |   40
>  |  40  |   40   |   40     |   40    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| refs         |    1      |     1     |    1     |   1    |  2   |    3
>  |  3   |   3    |    5     |    5    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| me           |   dia     |   hex     |   hex    |  hex   | hex  |
>  hex  | star |  star  |   star   |   star  |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| merange      |   57      |    57     |   57     |  57    |  57  |   57
>  | 57   |  57    |   57     |   92    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| subme        |    0      |     1     |    1     |   2    |  2   |    2
>  |  3   |   3    |    4     |    5    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rect         |    0      |     0     |    0     |   0    |  0   |    0
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| amp          |    0      |     0     |    0     |   0    |  0   |    0
>  |  0   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| max-merge    |    2      |     2     |    2     |   2    |  2   |    2
>  |  3   |   3    |    4     |    5    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| early-skip   |    1      |     1     |    1     |   1    |  0   |    0
>  |  0   |   0    |    0     |    0    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| fast-intra   |    1      |     1     |    1     |   1    |  1   |    0
>  |  0   |   0    |    0     |    0    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| b-intra      |    0      |     0     |    0     |   0    |  0   |    0
>  |  0   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| sao          |    0      |     0     |    1     |   1    |  1   |    1
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| signhide     |    0      |     1     |    1     |   1    |  1   |    1
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| weightp      |    0      |     0     |    1     |   1    |  1   |    1
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| weightb      |    0      |     0     |    0     |   0    |  0   |    0
>  |  0   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| aq-mode      |    0      |     0     |    1     |   1    |  1   |    1
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| cuTree       |    0      |     0     |    0     |   0    |  1   |    1
>  |  1   |   1    |    1     |    1    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rdLevel      |    2      |     2     |    2     |   2    |  2   |    3
>  |  4   |   6    |    6     |    6    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rdoq-level   |    0      |     0     |    0     |   0    |  0   |    0
>  |  2   |   2    |    2     |    2    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| tu-intra     |    1      |     1     |    1     |   1    |  1   |    1
>  |  1   |   2    |    3     |    4    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| tu-inter     |    1      |     1     |    1     |   1    |  1   |    1
>  |  1   |   2    |    3     |    4    |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +|                  | ultrafast | superfast | veryfast | faster | fast |
> medium | slow | slower | veryslow | placebo |
>
> ++=================+===========+===========+==========+========+======+========+======+========+==========+=========+
> +| ctu              |   32      |    32     |   32     |  64    |  64  |
>  64   |  64  |  64    |   64     |   64    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| min-cu-size      |   16      |     8     |    8     |   8    |   8  |
>   8   |   8  |   8    |    8     |    8    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| bframes          |    3      |     3     |    4     |   4    |  4   |
>   4   |  4   |   8    |    8     |    8    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| b-adapt          |    0      |     0     |    0     |   0    |  0   |
>   2   |  2   |   2    |    2     |    2    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rc-lookahead     |    5      |    10     |   15     |  15    |  15  |
>  20   |  25  |   30   |   40     |   60    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| scenecut         |    0      |    40     |   40     |  40    |  40  |
>  40   |  40  |   40   |   40     |   40    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| refs             |    1      |     1     |    1     |   1    |  2   |
>   3   |  3   |   3    |    5     |    5    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| me               |   dia     |   hex     |   hex    |  hex   | hex  |
>  hex  | star |  star  |   star   |   star  |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| merange          |   57      |    57     |   57     |  57    |  57  |
>  57   | 57   |  57    |   57     |   92    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| subme            |    0      |     1     |    1     |   2    |  2   |
>   2   |  3   |   3    |    4     |    5    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rect             |    0      |     0     |    0     |   0    |  0   |
>   0   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| amp              |    0      |     0     |    0     |   0    |  0   |
>   0   |  0   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| max-merge        |    2      |     2     |    2     |   2    |  2   |
>   2   |  3   |   3    |    4     |    5    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| early-skip       |    1      |     1     |    1     |   1    |  0   |
>   0   |  0   |   0    |    0     |    0    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| fast-intra       |    1      |     1     |    1     |   1    |  1   |
>   0   |  0   |   0    |    0     |    0    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| b-intra          |    0      |     0     |    0     |   0    |  0   |
>   0   |  0   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| sao              |    0      |     0     |    1     |   1    |  1   |
>   1   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| signhide         |    0      |     1     |    1     |   1    |  1   |
>   1   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| weightp          |    0      |     0     |    1     |   1    |  1   |
>   1   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| weightb          |    0      |     0     |    0     |   0    |  0   |
>   0   |  0   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| aq-mode          |    0      |     0     |    1     |   1    |  1   |
>   1   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| cuTree           |    0      |     0     |    0     |   0    |  1   |
>   1   |  1   |   1    |    1     |    1    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rdLevel          |    2      |     2     |    2     |   2    |  2   |
>   3   |  4   |   6    |    6     |    6    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rdoq-level       |    0      |     0     |    0     |   0    |  0   |
>   0   |  2   |   2    |    2     |    2    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| tu-intra         |    1      |     1     |    1     |   1    |  1   |
>   1   |  1   |   2    |    3     |    4    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| tu-inter         |    1      |     1     |    1     |   1    |  1   |
>   1   |  1   |   2    |    3     |    4    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| lookahead-slices |    8      |     8     |    8     |   8    |  8   |
>   8   |  4   |   4    |    0     |    0    |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
>
>  Placebo mode enables transform-skip prediction evaluation.
>
> diff -r 61396ea8096a -r cdc2b132a66f source/common/param.cpp
> --- a/source/common/param.cpp   Mon Oct 12 10:23:37 2015 +0800
> +++ b/source/common/param.cpp   Tue Nov 03 21:30:45 2015 +0530
> @@ -147,7 +147,7 @@
>      param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
>      param->bBPyramid = 1;
>      param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
> -    param->lookaheadSlices = 0;
> +    param->lookaheadSlices = 8;
>
>      /* Intra Coding Tools */
>      param->bEnableConstrainedIntra = 0;
> @@ -347,6 +347,7 @@
>              param->subpelRefine = 3;
>              param->maxNumMergeCand = 3;
>              param->searchMethod = X265_STAR_SEARCH;
> +            param->lookaheadSlices = 4; // limit parallelism as already
> enough work exists
>          }
>          else if (!strcmp(preset, "slower"))
>          {
> @@ -364,6 +365,7 @@
>              param->maxNumMergeCand = 3;
>              param->searchMethod = X265_STAR_SEARCH;
>              param->bIntraInBFrames = 1;
> +            param->lookaheadSlices = 4; // limit parallelism as already
> enough work exists
>          }
>          else if (!strcmp(preset, "veryslow"))
>          {
> @@ -382,6 +384,7 @@
>              param->searchMethod = X265_STAR_SEARCH;
>              param->maxNumReferences = 5;
>              param->bIntraInBFrames = 1;
> +            param->lookaheadSlices = 0; // disabled for best quality
>          }
>          else if (!strcmp(preset, "placebo"))
>          {
> @@ -403,6 +406,7 @@
>              param->maxNumReferences = 5;
>              param->rc.bEnableSlowFirstPass = 1;
>              param->bIntraInBFrames = 1;
> +            param->lookaheadSlices = 0; // disabled for best quality
>              // TODO: optimized esa
>          }
>          else
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20151104/0b8a984c/attachment-0001.html>


More information about the x265-devel mailing list