[x265-commits] [x265] weightp: fix lowresMvCosts[] indexing, add comment for fu...

Steve Borho steve at borho.org
Fri Jan 31 03:33:20 CET 2014


details:   http://hg.videolan.org/x265/rev/8552e8cc1a3c
branches:  
changeset: 5929:8552e8cc1a3c
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 28 08:49:01 2014 -0600
description:
weightp: fix lowresMvCosts[] indexing, add comment for future work
Subject: [x265] nit: line up WPP log info with other config items

details:   http://hg.videolan.org/x265/rev/4ec459e04f9e
branches:  stable
changeset: 5930:4ec459e04f9e
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 28 13:53:13 2014 -0600
description:
nit: line up WPP log info with other config items
Subject: [x265] asm: fix overflow due to pixel_satd asm function for 64-bit build

details:   http://hg.videolan.org/x265/rev/d6091cb46ae1
branches:  stable
changeset: 5931:d6091cb46ae1
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Wed Jan 29 12:05:06 2014 +0530
description:
asm: fix overflow due to pixel_satd asm function for 64-bit build
Subject: [x265] log: print ssim(dB) in per-frame csv logging

details:   http://hg.videolan.org/x265/rev/46aa0de4a8da
branches:  
changeset: 5932:46aa0de4a8da
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jan 30 06:39:06 2014 +0530
description:
log: print ssim(dB) in per-frame csv logging
Subject: [x265] log: print Summary for per-frame logging

details:   http://hg.videolan.org/x265/rev/e879873ce926
branches:  
changeset: 5933:e879873ce926
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jan 30 06:46:23 2014 +0530
description:
log: print Summary for per-frame logging
Subject: [x265] asm: fix for 32-bit build satd overflow issue.

details:   http://hg.videolan.org/x265/rev/86743912a5b0
branches:  stable
changeset: 5934:86743912a5b0
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Wed Jan 29 18:44:49 2014 +0530
description:
asm: fix for 32-bit build satd overflow issue.
Subject: [x265] asm: fixed hash mismatch on 16bpp due to intra_pred_ang

details:   http://hg.videolan.org/x265/rev/c0ec570c0105
branches:  stable
changeset: 5935:c0ec570c0105
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Jan 30 12:55:33 2014 +0530
description:
asm: fixed hash mismatch on 16bpp due to intra_pred_ang
Subject: [x265] asm: modified pixel_sad asm function to avoid overflow

details:   http://hg.videolan.org/x265/rev/b852f74bdd8c
branches:  stable
changeset: 5936:b852f74bdd8c
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Jan 30 18:15:02 2014 +0530
description:
asm: modified pixel_sad asm function to avoid overflow
Subject: [x265] Merge bug fixes from stable.

details:   http://hg.videolan.org/x265/rev/fffdf3dce410
branches:  
changeset: 5937:fffdf3dce410
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Thu Jan 30 20:02:41 2014 +0530
description:
Merge bug fixes from stable.
Subject: [x265] common: consolodate malloc/free funcdefs to common.h

details:   http://hg.videolan.org/x265/rev/71f6479dc354
branches:  stable
changeset: 5938:71f6479dc354
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 30 12:38:05 2014 -0600
description:
common: consolodate malloc/free funcdefs to common.h
Subject: [x265] wavefront: use x265_malloc for bitmaps, to ensure alignment

details:   http://hg.videolan.org/x265/rev/adf571b1bb94
branches:  stable
changeset: 5939:adf571b1bb94
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 30 12:38:34 2014 -0600
description:
wavefront: use x265_malloc for bitmaps, to ensure alignment
Subject: [x265] wavefront: eliminate redundant reads of m_queuedBitmap

details:   http://hg.videolan.org/x265/rev/6d5f2f61341a
branches:  stable
changeset: 5940:6d5f2f61341a
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 30 12:43:21 2014 -0600
description:
wavefront: eliminate redundant reads of m_queuedBitmap
Subject: [x265] encoder: refactor frame encoder recon row synchronization

details:   http://hg.videolan.org/x265/rev/4a4c4cbe9c67
branches:  stable
changeset: 5941:4a4c4cbe9c67
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 28 00:17:28 2014 -0600
description:
encoder: refactor frame encoder recon row synchronization

The previous approach depended on a common event (owned by TComPic) being
triggered multiple times for each row, one trigger per referencing frame, but I
believe this was fragile as one frame encoder could steal notifications from
another.

In the new scheme, each frame encoder waits on its own sync event when it blocks
for recon pixels. When a frame encoder finishes reconstructing a CU row, it
calls a top-level encoder function which determines if any frame encoders are
blocked on that POC and wakes them up.

This should prevent deadlocks from frame encoder synchronization
Subject: [x265] cturow: detect and prevent simultaneous row access

details:   http://hg.videolan.org/x265/rev/564eefbb3812
branches:  stable
changeset: 5942:564eefbb3812
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 30 17:34:31 2014 -0600
description:
cturow: detect and prevent simultaneous row access

Temporary workaround until we are certain the findJob() race hazards are indeed
resolved completely.
Subject: [x265] threadpool: use a wait event per worker thread

details:   http://hg.videolan.org/x265/rev/6fe8d1d519f7
branches:  
changeset: 5943:6fe8d1d519f7
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 28 01:39:22 2014 -0600
description:
threadpool: use a wait event per worker thread

For simplicity, this patch caps the number of worker threads to 64. The bitmap
could be trivially extended if necessary.

This removes the common wake event, which complicated startup and shutdown and
flush events.
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/eb3713ab0641
branches:  
changeset: 5944:eb3713ab0641
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 30 18:19:02 2014 -0600
description:
Merge with stable

diffstat:

 source/Lib/TLibCommon/CommonDef.h    |     5 +-
 source/Lib/TLibCommon/TComPic.h      |     1 -
 source/common/common.h               |     2 +
 source/common/threadpool.cpp         |    79 +-
 source/common/wavefront.cpp          |    19 +-
 source/common/x86/asm-primitives.cpp |    61 +-
 source/common/x86/intrapred16.asm    |    11 +-
 source/common/x86/pixel-a.asm        |  1420 ++++++++++++++++++---------------
 source/common/x86/sad-a.asm          |   132 +--
 source/encoder/cturow.h              |    14 +-
 source/encoder/encoder.cpp           |    31 +-
 source/encoder/encoder.h             |     2 +
 source/encoder/frameencoder.cpp      |    44 +-
 source/encoder/frameencoder.h        |     4 +
 source/encoder/framefilter.cpp       |     7 +-
 source/encoder/framefilter.h         |     1 +
 source/encoder/weightPrediction.cpp  |     3 +-
 source/test/intrapredharness.cpp     |    11 +-
 18 files changed, 969 insertions(+), 878 deletions(-)

diffs (truncated from 2683 to 300 lines):

diff -r 923edbb08a59 -r eb3713ab0641 source/Lib/TLibCommon/CommonDef.h
--- a/source/Lib/TLibCommon/CommonDef.h	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/Lib/TLibCommon/CommonDef.h	Thu Jan 30 18:19:02 2014 -0600
@@ -39,6 +39,7 @@
 #define X265_COMMONDEF_H
 
 #include <cstdlib>
+#include "common.h"
 #include "TypeDef.h"
 
 //! \ingroup TLibCommon
@@ -141,10 +142,6 @@
 #define X265_MALLOC(type, count)    x265_malloc(sizeof(type) * (count))
 #define X265_FREE(ptr)              x265_free(ptr)
 
-// new code can use these functions directly
-extern void  x265_free(void *);
-extern void *x265_malloc(size_t size);
-
 // ====================================================================================================================
 // Coding tool configuration
 // ====================================================================================================================
diff -r 923edbb08a59 -r eb3713ab0641 source/Lib/TLibCommon/TComPic.h
--- a/source/Lib/TLibCommon/TComPic.h	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/Lib/TLibCommon/TComPic.h	Thu Jan 30 18:19:02 2014 -0600
@@ -79,7 +79,6 @@ public:
     //** Frame Parallelism - notification between FrameEncoders of available motion reference rows **
     volatile uint32_t     m_reconRowCount;      // count of CTU rows completely reconstructed and extended for motion reference
     volatile uint32_t     m_countRefEncoders;   // count of FrameEncoder threads monitoring m_reconRowCount
-    Event                 m_reconRowWait;       // event triggered m_countRefEncoders times each time a recon row is completed
     void*                 m_userData;           // user provided pointer passed in with this picture
     
     int64_t               m_pts;                // user provided presentation time stamp
diff -r 923edbb08a59 -r eb3713ab0641 source/common/common.h
--- a/source/common/common.h	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/common/common.h	Thu Jan 30 18:19:02 2014 -0600
@@ -115,5 +115,7 @@ void x265_print_params(x265_param *param
 int x265_set_globals(x265_param *param);
 int x265_exp2fix8(double x);
 char *x265_param2string(x265_param *p);
+void *x265_malloc(size_t size);
+void x265_free(void *ptr);
 
 #endif // ifndef X265_COMMON_H
diff -r 923edbb08a59 -r eb3713ab0641 source/common/threadpool.cpp
--- a/source/common/threadpool.cpp	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/common/threadpool.cpp	Thu Jan 30 18:19:02 2014 -0600
@@ -25,6 +25,7 @@
 
 #include "threadpool.h"
 #include "threading.h"
+#include "common.h"
 #include <assert.h>
 #include <string.h>
 #include <new>
@@ -48,35 +49,39 @@ private:
 
     PoolThread& operator =(const PoolThread&);
 
+    int            m_id;
+
     bool           m_dirty;
 
-    bool           m_idle;
-
     bool           m_exited;
 
+    Event          m_wakeEvent;
+
 public:
 
-    PoolThread(ThreadPoolImpl& pool) : m_pool(pool), m_dirty(false), m_idle(false), m_exited(false) {}
+    PoolThread(ThreadPoolImpl& pool, int id)
+        : m_pool(pool)
+        , m_id(id)
+        , m_dirty(false)
+        , m_exited(false)
+    {
+    }
 
-    //< query if thread is still potentially walking provider list
-    bool isDirty() const  { return !m_idle && m_dirty; }
-
-    //< set m_dirty if the thread might be walking provider list
-    void markDirty()      { m_dirty = !m_idle; }
+    bool isDirty() const  { return m_dirty; }
+    void markDirty()      { m_dirty = true; }
 
     bool isExited() const { return m_exited; }
 
+    void poke()           { m_wakeEvent.trigger(); }
+
     virtual ~PoolThread() {}
 
     void threadMain();
 
-    static volatile int s_sleepCount;
-    static Event s_wakeEvent;
+    static volatile uint64_t s_sleepMap;
 };
 
-volatile int PoolThread::s_sleepCount = 0;
-
-Event PoolThread::s_wakeEvent;
+volatile uint64_t PoolThread::s_sleepMap /* = 0 */;
 
 class ThreadPoolImpl : public ThreadPool
 {
@@ -155,14 +160,14 @@ void PoolThread::threadMain()
             cur = cur->m_nextProvider;
         }
 
+        // this thread has reached the end of the provider list
         m_dirty = false;
+
         if (cur == NULL)
         {
-            m_idle = true;
-            ATOMIC_INC(&s_sleepCount);
-            s_wakeEvent.wait();
-            ATOMIC_DEC(&s_sleepCount);
-            m_idle = false;
+            uint64_t bit = 1LL << m_id;
+            ATOMIC_OR(&s_sleepMap, bit);
+            m_wakeEvent.wait();
         }
     }
 
@@ -171,7 +176,17 @@ void PoolThread::threadMain()
 
 void ThreadPoolImpl::pokeIdleThread()
 {
-    PoolThread::s_wakeEvent.trigger();
+    /* Find a bit in the sleeping thread bitmap and poke it awake */
+    uint64_t oldval = PoolThread::s_sleepMap;
+    if (oldval)
+    {
+        unsigned long id;
+        CTZ64(id, oldval);
+
+        uint64_t newval = oldval & ~(1LL << id);
+        if (ATOMIC_CAS(&PoolThread::s_sleepMap, oldval, newval) == oldval)
+            m_threads[id].poke();
+    }
 }
 
 ThreadPoolImpl *ThreadPoolImpl::instance;
@@ -211,6 +226,7 @@ ThreadPoolImpl::ThreadPoolImpl(int numTh
 {
     if (numThreads == 0)
         numThreads = get_cpu_count();
+    numThreads = X265_MIN(64, numThreads); // do not overflow sleep map
 
     char *buffer = new char[sizeof(PoolThread) * numThreads];
     m_threads = reinterpret_cast<PoolThread*>(buffer);
@@ -218,16 +234,19 @@ ThreadPoolImpl::ThreadPoolImpl(int numTh
 
     if (m_threads)
     {
+        uint64_t idlemap = 0;
+
         m_ok = true;
         for (int i = 0; i < numThreads; i++)
         {
-            new (buffer)PoolThread(*this);
+            new (buffer)PoolThread(*this, i);
             buffer += sizeof(PoolThread);
             m_ok = m_ok && m_threads[i].start();
+            idlemap |= (1LL << i);
         }
 
         // Wait for threads to spin up and idle
-        while (PoolThread::s_sleepCount < m_numThreads)
+        while (PoolThread::s_sleepMap != idlemap)
         {
             GIVE_UP_TIME();
         }
@@ -238,18 +257,24 @@ void ThreadPoolImpl::Stop()
 {
     if (m_ok)
     {
+        uint64_t idlemap = 0;
+        for (int i = 0; i < m_numThreads; i++)
+            idlemap |= (1LL << i);
+
         // wait for all threads to idle
-        while (PoolThread::s_sleepCount < m_numThreads)
+        while (PoolThread::s_sleepMap != idlemap)
         {
             GIVE_UP_TIME();
         }
 
         // set invalid flag, then wake them up so they exit their main func
         m_ok = false;
-        int exited_count;
+        for (int i = 0; i < m_numThreads; i++)
+            pokeIdleThread();
+
+        int exited_count = 0;
         do
         {
-            pokeIdleThread();
             GIVE_UP_TIME();
             exited_count = 0;
             for (int i = 0; i < m_numThreads; i++)
@@ -319,14 +344,14 @@ void ThreadPoolImpl::dequeueJobProvider(
     p.m_prevProvider = NULL;
 }
 
-/* Ensure all threads are either idle, or have made a full
- * pass through the provider list, ensuring dequeued providers
- * are safe for deletion. */
+/* Ensure all threads have made a full pass through the provider list, ensuring
+ * dequeued providers are safe for deletion. */
 void ThreadPoolImpl::FlushProviderList()
 {
     for (int i = 0; i < m_numThreads; i++)
     {
         m_threads[i].markDirty();
+        m_threads[i].poke();
     }
 
     int i;
diff -r 923edbb08a59 -r eb3713ab0641 source/common/wavefront.cpp
--- a/source/common/wavefront.cpp	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/common/wavefront.cpp	Thu Jan 30 18:19:02 2014 -0600
@@ -24,9 +24,9 @@
 #include "threadpool.h"
 #include "threading.h"
 #include "wavefront.h"
+#include "common.h"
 #include <assert.h>
 #include <string.h>
-#include <new>
 
 namespace x265 {
 // x265 private namespace
@@ -38,11 +38,11 @@ bool WaveFront::init(int numRows)
     if (m_pool)
     {
         m_numWords = (numRows + 63) >> 6;
-        m_queuedBitmap = new uint64_t[m_numWords];
+        m_queuedBitmap = (uint64_t*)x265_malloc(sizeof(uint64_t) * m_numWords);
         if (m_queuedBitmap)
             memset((void*)m_queuedBitmap, 0, sizeof(uint64_t) * m_numWords);
 
-        m_enableBitmap = new uint64_t[m_numWords];
+        m_enableBitmap = (uint64_t*)x265_malloc(sizeof(uint64_t) * m_numWords);
         if (m_enableBitmap)
             memset((void*)m_enableBitmap, 0, sizeof(uint64_t) * m_numWords);
 
@@ -54,8 +54,8 @@ bool WaveFront::init(int numRows)
 
 WaveFront::~WaveFront()
 {
-    delete[] m_queuedBitmap;
-    delete[] m_enableBitmap;
+    x265_free((void*)m_queuedBitmap);
+    x265_free((void*)m_enableBitmap);
 }
 
 void WaveFront::clearEnabledRowMask()
@@ -112,12 +112,10 @@ bool WaveFront::findJob()
     // thread safe
     for (int w = 0; w < m_numWords; w++)
     {
-        while (m_queuedBitmap[w])
+        uint64_t oldval = m_queuedBitmap[w];
+        while (oldval & m_enableBitmap[w])
         {
-            uint64_t oldval = m_queuedBitmap[w];
-            uint64_t mask = m_queuedBitmap[w] & m_enableBitmap[w];
-            if (mask == 0) // race condition
-                break;
+            uint64_t mask = oldval & m_enableBitmap[w];
 
             CTZ64(id, mask);
 
@@ -129,6 +127,7 @@ bool WaveFront::findJob()
                 return true;
             }
             // some other thread cleared the bit, try another bit
+            oldval = m_queuedBitmap[w];
         }
     }
 
diff -r 923edbb08a59 -r eb3713ab0641 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Tue Jan 28 08:07:08 2014 -0600
+++ b/source/common/x86/asm-primitives.cpp	Thu Jan 30 18:19:02 2014 -0600
@@ -64,14 +64,30 @@ extern "C" {
 #define INIT8(name, cpu) INIT8_NAME(name, name, cpu)
 
 #define HEVC_SATD(cpu) \
-    p.satd[LUMA_32x32] = x265_pixel_satd_32x32_ ## cpu; \
-    p.satd[LUMA_24x32] = x265_pixel_satd_24x32_ ## cpu; \
-    p.satd[LUMA_64x64] = x265_pixel_satd_64x64_ ## cpu; \
-    p.satd[LUMA_64x32] = x265_pixel_satd_64x32_ ## cpu; \
-    p.satd[LUMA_32x64] = x265_pixel_satd_32x64_ ## cpu; \
-    p.satd[LUMA_64x48] = x265_pixel_satd_64x48_ ## cpu; \
-    p.satd[LUMA_48x64] = x265_pixel_satd_48x64_ ## cpu; \


More information about the x265-commits mailing list