[x265-commits] [x265] wavefront: paranoid bitmap clears

Steve Borho steve at borho.org
Mon Mar 23 03:23:02 CET 2015


details:   http://hg.videolan.org/x265/rev/c80282c519a8
branches:  
changeset: 9848:c80282c519a8
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 22 11:42:08 2015 -0500
description:
wavefront: paranoid bitmap clears

Ensure no worker threads will start processing row before the frame encoder is
ready for it.
Subject: [x265] frameencoder: inline compressCTU()

details:   http://hg.videolan.org/x265/rev/13e3c7e624eb
branches:  
changeset: 9849:13e3c7e624eb
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 22 11:53:38 2015 -0500
description:
frameencoder: inline compressCTU()

There was nothing gained by this being a seperate function, it only obsfucated
the order everything happens.
Subject: [x265] regression: do not use --pools 0 to disable pool features

details:   http://hg.videolan.org/x265/rev/885df756fee6
branches:  
changeset: 9850:885df756fee6
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 22 12:01:12 2015 -0500
description:
regression: do not use --pools 0 to disable pool features

The regression script will add random spot-checks to the command line, params
which are not expected to change the outputs. One of those is --pools 3 which
in this test case definitely changes the outputs, it goes from --no-wpp to
--wpp and shows up as a test failure when really it is a test-case problem.
Subject: [x265] smoke-test: nits

details:   http://hg.videolan.org/x265/rev/a2a556d1d2ee
branches:  
changeset: 9851:a2a556d1d2ee
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 22 13:32:28 2015 -0400
description:
smoke-test: nits
Subject: [x265] encoder: add explicit synchronization in frame thread startup

details:   http://hg.videolan.org/x265/rev/cc496665280f
branches:  
changeset: 9852:cc496665280f
user:      Steve Borho <steve at borho.org>
date:      Sun Mar 22 22:16:45 2015 -0400
description:
encoder: add explicit synchronization in frame thread startup

With the new thread pool design, the first frame encoder initializes TLD
for all frame encoders in its pool (so the memory is allocated by a
thread running on the pool's socket).  The second frame encoder actually
actually encodes the first frame, so if the first takes to long to
initialize it can cause SIGSEGV.  The regression test caught this on an
E5-2699 v3

diffstat:

 source/common/wavefront.cpp      |    1 +
 source/encoder/encoder.cpp       |    4 +
 source/encoder/frameencoder.cpp  |  187 ++++++++++++++++++--------------------
 source/encoder/frameencoder.h    |    3 -
 source/test/regression-tests.txt |    2 +-
 source/test/smoke-tests.txt      |    4 +-
 6 files changed, 98 insertions(+), 103 deletions(-)

diffs (288 lines):

diff -r 887ac5e457e0 -r cc496665280f source/common/wavefront.cpp
--- a/source/common/wavefront.cpp	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/common/wavefront.cpp	Sun Mar 22 22:16:45 2015 -0400
@@ -54,6 +54,7 @@ WaveFront::~WaveFront()
 void WaveFront::clearEnabledRowMask()
 {
     memset((void*)m_externalDependencyBitmap, 0, sizeof(uint32_t) * m_numWords);
+    memset((void*)m_internalDependencyBitmap, 0, sizeof(uint32_t) * m_numWords);
 }
 
 void WaveFront::enqueueRow(int row)
diff -r 887ac5e457e0 -r cc496665280f source/encoder/encoder.cpp
--- a/source/encoder/encoder.cpp	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/encoder/encoder.cpp	Sun Mar 22 22:16:45 2015 -0400
@@ -254,8 +254,12 @@ void Encoder::create()
             m_aborted = true;
         }
     }
+
     for (int i = 0; i < m_param->frameNumThreads; i++)
+    {
         m_frameEncoder[i]->start();
+        m_frameEncoder[i]->m_done.wait(); /* wait for thread to initialize */
+    }
 
     if (m_param->bEmitHRDSEI)
         m_rateControl->initHRD(m_sps);
diff -r 887ac5e457e0 -r cc496665280f source/encoder/frameencoder.cpp
--- a/source/encoder/frameencoder.cpp	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/encoder/frameencoder.cpp	Sun Mar 22 22:16:45 2015 -0400
@@ -273,6 +273,7 @@ void FrameEncoder::threadMain()
         m_localTldIdx = 0;
     }
 
+    m_done.trigger();     /* signal that thread is initialized */ 
     m_enable.wait();      /* Encoder::encode() triggers this event */
 
     while (m_threadActive)
@@ -302,6 +303,15 @@ void FrameEncoder::compressFrame()
     m_allRowsAvailableTime = 0;
     m_stallStartTime = 0;
 
+    m_bLastRowCompleted = false;
+    m_bAllRowsStop = false;
+    m_vbvResetTriggerRow = -1;
+
+    m_SSDY = m_SSDU = m_SSDV = 0;
+    m_ssim = 0;
+    m_ssimCnt = 0;
+    memset(&m_frameStats, 0, sizeof(m_frameStats));
+
     /* Emit access unit delimiter unless this is the first frame and the user is
      * not repeating headers (since AUD is supposed to be the first NAL in the access
      * unit) */
@@ -361,7 +371,10 @@ void FrameEncoder::compressFrame()
 
     m_frameFilter.start(m_frame, m_initSliceContext, qp);
 
-    // reset entropy coders
+    /* ensure all rows are blocked prior to initializing row CTU counters */
+    WaveFront::clearEnabledRowMask();
+
+    /* reset entropy coders */
     m_entropyCoder.load(m_initSliceContext);
     for (uint32_t i = 0; i < m_numRows; i++)
         m_rows[i].init(m_initSliceContext);
@@ -459,10 +472,82 @@ void FrameEncoder::compressFrame()
         m_nalList.serialize(NAL_UNIT_PREFIX_SEI, m_bs);
     }
 
-    // Analyze CTU rows, most of the hard work is done here
-    // frame is compressed in a wave-front pattern if WPP is enabled. Loop filter runs as a
-    // wave-front behind the CU compression and reconstruction
-    compressCTURows();
+    /* Analyze CTU rows, most of the hard work is done here.  Frame is
+     * compressed in a wave-front pattern if WPP is enabled. Row based loop
+     * filters runs behind the CTU compression and reconstruction */
+
+    m_rows[0].active = true;
+    if (m_param->bEnableWavefront)
+    {
+        for (uint32_t row = 0; row < m_numRows; row++)
+        {
+            // block until all reference frames have reconstructed the rows we need
+            for (int l = 0; l < numPredDir; l++)
+            {
+                for (int ref = 0; ref < slice->m_numRefIdx[l]; ref++)
+                {
+                    Frame *refpic = slice->m_refPicList[l][ref];
+
+                    uint32_t reconRowCount = refpic->m_reconRowCount.get();
+                    while ((reconRowCount != m_numRows) && (reconRowCount < row + m_refLagRows))
+                        reconRowCount = refpic->m_reconRowCount.waitForChange(reconRowCount);
+
+                    if ((bUseWeightP || bUseWeightB) && m_mref[l][ref].isWeighted)
+                        m_mref[l][ref].applyWeight(row + m_refLagRows, m_numRows);
+                }
+            }
+
+            enableRowEncoder(row); /* clear external dependency for this row */
+            if (!row)
+            {
+                m_row0WaitTime = x265_mdate();
+                enqueueRowEncoder(0); /* clear internal dependency, start wavefront */
+            }
+            tryWakeOne();
+        }
+
+        m_allRowsAvailableTime = x265_mdate();
+        tryWakeOne(); /* ensure one thread is active or help-wanted flag is set prior to blocking */
+        static const int block_ms = 250;
+        while (m_completionEvent.timedWait(block_ms))
+            tryWakeOne();
+    }
+    else
+    {
+        for (uint32_t i = 0; i < m_numRows + m_filterRowDelay; i++)
+        {
+            // compress
+            if (i < m_numRows)
+            {
+                // block until all reference frames have reconstructed the rows we need
+                for (int l = 0; l < numPredDir; l++)
+                {
+                    int list = l;
+                    for (int ref = 0; ref < slice->m_numRefIdx[list]; ref++)
+                    {
+                        Frame *refpic = slice->m_refPicList[list][ref];
+
+                        uint32_t reconRowCount = refpic->m_reconRowCount.get();
+                        while ((reconRowCount != m_numRows) && (reconRowCount < i + m_refLagRows))
+                            reconRowCount = refpic->m_reconRowCount.waitForChange(reconRowCount);
+
+                        if ((bUseWeightP || bUseWeightB) && m_mref[l][ref].isWeighted)
+                            m_mref[list][ref].applyWeight(i + m_refLagRows, m_numRows);
+                    }
+                }
+
+                if (!i)
+                    m_row0WaitTime = x265_mdate();
+                else if (i == m_numRows - 1)
+                    m_allRowsAvailableTime = x265_mdate();
+                processRowEncoder(i, m_tld[m_localTldIdx]);
+            }
+
+            // filter
+            if (i >= m_filterRowDelay)
+                m_frameFilter.processRow(i - m_filterRowDelay);
+        }
+    }
 
     if (m_param->rc.bStatWrite)
     {
@@ -675,98 +760,6 @@ void FrameEncoder::encodeSlice()
         m_entropyCoder.finishSlice();
 }
 
-void FrameEncoder::compressCTURows()
-{
-    Slice* slice = m_frame->m_encData->m_slice;
-
-    m_bLastRowCompleted = false;
-    m_bAllRowsStop = false;
-    m_vbvResetTriggerRow = -1;
-
-    m_SSDY = m_SSDU = m_SSDV = 0;
-    m_ssim = 0;
-    m_ssimCnt = 0;
-    memset(&m_frameStats, 0, sizeof(m_frameStats));
-
-    bool bUseWeightP = slice->m_pps->bUseWeightPred && slice->m_sliceType == P_SLICE;
-    bool bUseWeightB = slice->m_pps->bUseWeightedBiPred && slice->m_sliceType == B_SLICE;
-    int numPredDir = slice->isInterP() ? 1 : slice->isInterB() ? 2 : 0;
-
-    m_rows[0].active = true;
-    if (m_param->bEnableWavefront)
-    {
-        WaveFront::clearEnabledRowMask();
-
-        for (uint32_t row = 0; row < m_numRows; row++)
-        {
-            // block until all reference frames have reconstructed the rows we need
-            for (int l = 0; l < numPredDir; l++)
-            {
-                for (int ref = 0; ref < slice->m_numRefIdx[l]; ref++)
-                {
-                    Frame *refpic = slice->m_refPicList[l][ref];
-
-                    uint32_t reconRowCount = refpic->m_reconRowCount.get();
-                    while ((reconRowCount != m_numRows) && (reconRowCount < row + m_refLagRows))
-                        reconRowCount = refpic->m_reconRowCount.waitForChange(reconRowCount);
-
-                    if ((bUseWeightP || bUseWeightB) && m_mref[l][ref].isWeighted)
-                        m_mref[l][ref].applyWeight(row + m_refLagRows, m_numRows);
-                }
-            }
-
-            enableRowEncoder(row);
-            if (!row)
-            {
-                m_row0WaitTime = x265_mdate();
-                enqueueRowEncoder(0);
-            }
-            tryWakeOne();
-        }
-
-        m_allRowsAvailableTime = x265_mdate();
-        tryWakeOne(); /* ensure one thread is active or help-wanted flag is set prior to blocking */
-        static const int block_ms = 250;
-        while (m_completionEvent.timedWait(block_ms))
-            tryWakeOne();
-    }
-    else
-    {
-        for (uint32_t i = 0; i < m_numRows + m_filterRowDelay; i++)
-        {
-            // Encode
-            if (i < m_numRows)
-            {
-                // block until all reference frames have reconstructed the rows we need
-                for (int l = 0; l < numPredDir; l++)
-                {
-                    int list = l;
-                    for (int ref = 0; ref < slice->m_numRefIdx[list]; ref++)
-                    {
-                        Frame *refpic = slice->m_refPicList[list][ref];
-
-                        uint32_t reconRowCount = refpic->m_reconRowCount.get();
-                        while ((reconRowCount != m_numRows) && (reconRowCount < i + m_refLagRows))
-                            reconRowCount = refpic->m_reconRowCount.waitForChange(reconRowCount);
-
-                        if ((bUseWeightP || bUseWeightB) && m_mref[l][ref].isWeighted)
-                            m_mref[list][ref].applyWeight(i + m_refLagRows, m_numRows);
-                    }
-                }
-
-                if (!i)
-                    m_row0WaitTime = x265_mdate();
-                else if (i == m_numRows - 1)
-                    m_allRowsAvailableTime = x265_mdate();
-                processRowEncoder(i, m_tld[m_localTldIdx]);
-            }
-
-            if (i >= m_filterRowDelay)
-                m_frameFilter.processRow(i - m_filterRowDelay);
-        }
-    }
-}
-
 void FrameEncoder::processRow(int row, int threadId)
 {
     int64_t startTime = x265_mdate();
diff -r 887ac5e457e0 -r cc496665280f source/encoder/frameencoder.h
--- a/source/encoder/frameencoder.h	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/encoder/frameencoder.h	Sun Mar 22 22:16:45 2015 -0400
@@ -222,9 +222,6 @@ protected:
     /* analyze / compress frame, can be run in parallel within reference constraints */
     void compressFrame();
 
-    /* called by compressFrame to perform wave-front compression analysis */
-    virtual void compressCTURows();
-
     /* called by compressFrame to generate final per-row bitstreams */
     void encodeSlice();
 
diff -r 887ac5e457e0 -r cc496665280f source/test/regression-tests.txt
--- a/source/test/regression-tests.txt	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/test/regression-tests.txt	Sun Mar 22 22:16:45 2015 -0400
@@ -52,7 +52,7 @@ DucksAndLegs_1920x1080_60_10bit_444.yuv,
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset superfast --weightp --nr-intra 1000 -F4
 FourPeople_1280x720_60.y4m,--preset medium --qp 38
 FourPeople_1280x720_60.y4m,--preset slower
-FourPeople_1280x720_60.y4m,--preset superfast --pools 0
+FourPeople_1280x720_60.y4m,--preset superfast --no-wpp
 Keiba_832x480_30.y4m,--preset medium --pmode
 Keiba_832x480_30.y4m,--preset slower --fast-intra --nr-inter 500 -F4
 Keiba_832x480_30.y4m,--preset superfast --no-fast-intra --nr-intra 1000 -F4
diff -r 887ac5e457e0 -r cc496665280f source/test/smoke-tests.txt
--- a/source/test/smoke-tests.txt	Sat Mar 21 01:27:07 2015 -0500
+++ b/source/test/smoke-tests.txt	Sun Mar 22 22:16:45 2015 -0400
@@ -13,5 +13,5 @@ RaceHorses_416x240_30_10bit.yuv,--preset
 RaceHorses_416x240_30_10bit.yuv,--preset=slower --bitrate 500 -F4 --rdoq-level 1
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset=ultrafast --constrained-intra --min-keyint 5 --keyint 10
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset=medium --max-tu-size 16
-DucksAndLegs_1920x1080_60_10bit_422.yuv, --preset=veryfast --min-cu 16
-DucksAndLegs_1920x1080_60_10bit_422.yuv, --preset=fast --weightb --interlace bff
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset=veryfast --min-cu 16
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset=fast --weightb --interlace bff


More information about the x265-commits mailing list