[x265] [PATCH] Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances

dave dtyx265 at gmail.com
Fri Sep 26 01:41:16 CEST 2014


On 09/24/2014 06:10 PM, Steve Borho wrote:
> On 09/24, dave wrote:
>> On 09/24/2014 02:07 PM, Steve Borho wrote:
>>> On 09/24, dtyx265 at gmail.com wrote:
>>>> # HG changeset patch
>>>> # User David T Yuen <dtyx265 at gmail.com>
>>>> # Date 1411589843 25200
>>>> # Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee
>>>> # Parent  b2b7072ddbf73085d457bd6a71bca946e505dea8
>>>> Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances.
>>>>
>>>> This uses less memory since m_tld isn't used in --wpp and Encoder's ThreadLocalData instances are not used in --no-wpp
>>>> Also there was a small performance increase on my system
>>>>
>>>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/encoder.cpp
>>>> --- a/source/encoder/encoder.cpp	Wed Sep 24 11:48:15 2014 +0530
>>>> +++ b/source/encoder/encoder.cpp	Wed Sep 24 13:17:23 2014 -0700
>>>> @@ -102,15 +102,20 @@
>>>>       m_scalingList.setupQuantMatrices();
>>>>       /* Allocate thread local data shared by all frame encoders */
>>>> -    ThreadPool *pool = ThreadPool::getThreadPool();
>>>> -    const int poolThreadCount = pool ? pool->getThreadCount() : 1;
>>>> -    m_threadLocalData = new ThreadLocalData[poolThreadCount];
>>>> -    for (int i = 0; i < poolThreadCount; i++)
>>>> +    const int poolThreadCount = ThreadPool::getThreadPool()->getThreadCount();
>>>> +    int numLocalData = m_param->frameNumThreads;
>>>> +    if (m_param->bEnableWavefront)
>>>> +        numLocalData = poolThreadCount;
>>> The general concept is sound, I think, but there could be edge cases
>>> where the number of frame threads is greater than the number of
>>> allocated pool threads.  Perhaps:
>>>
>>>         const int numLocalData = X265_MAX(poolThreadCount, m_param->frameNumThreads);
>> I tested this for all presets, --wpp and --no-wpp with no problems.
>>
>> At first I tried something like your suggestion.
>>
>> if (numLocalData < poolThreadCount) then numLocalData = poolThreadCount;
>>
>> But then you may have idle ThreadLocalData laying around.
> Probably not.  In Encoder::configure() when WPP is disabled we are
> reducing the number of worker threads to 1 regardless of --threads.
>
>      // Trim the thread pool if WPP is disabled
>      if (!p->bEnableWavefront)
>          p->poolNumThreads = 1;
>
> So the problem would be impossible to reproduce via the command line
> without changing that function.
>
> As-is, I guess you would want:
>   numLocalData = m_param->bEnableWavefront ? pool->getThreadCount() : m_param->frameNumThreads;
>
> which is essentially what it does now, so I think I'll queue the
> original patch.
>
>> Either way, I thought I could force an edge case by playing around with the
>> number of pool threads and framethreads but I couldn't force a segfault due
>> to a FrameEncoder not having a ThreadLocalData object.  Playing with the
>> number of threads and framethreads did alter hevc output.
>>
>> By the way, setting an excessive number of framethreads hurts performance as
>> stated in the cli documentation.. so why allow framethreads > than
>> cpucount()
> We generally don't prevent options which might be slow; the user may
> have some exotic hardware that works well with it.  We generally only
> prevent broken configurations that would cause crashes or invalid
> streams.
>
>>>> +    m_threadLocalData = new ThreadLocalData[numLocalData];
>>>> +    for (int i = 0; i < numLocalData; i++)
>>>>       {
>>>>           m_threadLocalData[i].cuCoder.initSearch(m_param, m_scalingList);
>>>>           m_threadLocalData[i].cuCoder.create(g_maxCUDepth + 1, g_maxCUSize);
>>>>       }
>>>> +    for (int i = 0; i < m_param->frameNumThreads; i++)
>>>> +        m_frameEncoder[i].m_tld = &m_threadLocalData[i];
>>>> +
>>>>       m_lookahead = new Lookahead(m_param, m_threadPool, this);
>>>>       m_dpb = new DPB(m_param);
>>>>       m_rateControl = new RateControl(m_param);
>>>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/frameencoder.cpp
>>>> --- a/source/encoder/frameencoder.cpp	Wed Sep 24 11:48:15 2014 +0530
>>>> +++ b/source/encoder/frameencoder.cpp	Wed Sep 24 13:17:23 2014 -0700
>>>> @@ -105,9 +105,6 @@
>>>>           m_pool = NULL;
>>>>       }
>>>> -    m_tld.cuCoder.initSearch(m_param, top->m_scalingList);
>>>> -    m_tld.cuCoder.create(g_maxCUDepth + 1, g_maxCUSize);
>>>> -
>>>>       m_frameFilter.init(top, this, numRows);
>>>>       // initialize HRD parameters of SPS
>>>> @@ -470,7 +467,7 @@
>>>>               }
>>>>           }
>>>> -        m_tld.cuCoder.loadCTUData(cu);
>>>> +        m_tld->cuCoder.loadCTUData(cu);
>>> hmm; I wonder if this line is even necessary; but that is perhaps a
>>> separate issue from this patch.
I believe you are correct.  I removed this call and everything ran 
fine.  I will submit a patch soon...
>> This is what I just started working on
>> 1.  Create the table to replace Analysis::getDepthScanIdx()
>> 2.  Clean up loadCTUData as much as possible
>> 3.  Move loadCTUData to TComDataCU since it has nothing to do with Analysis
>> other than calling getDepthScanIdx() and even that is just an independent
>> function which just happens to be in Analysis.
> Interesting, ok.
>
>>>>           // final coding (bitstream generation) for this CU
>>>>           m_entropyCoder.encodeCTU(cu);
>>>> @@ -581,8 +578,8 @@
>>>>       const int realRow = row >> 1;
>>>>       const int typeNum = row & 1;
>>>> -    ThreadLocalData& tld = threadId >= 0 ? m_top->m_threadLocalData[threadId] : m_tld;
>>>> -
>>>> +    ThreadLocalData& tld = threadId >= 0 ? m_top->m_threadLocalData[threadId] : *m_tld;
>>>> +
>>>>       if (!typeNum)
>>>>           processRowEncoder(realRow, tld);
>>>>       else
>>>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/frameencoder.h
>>>> --- a/source/encoder/frameencoder.h	Wed Sep 24 11:48:15 2014 +0530
>>>> +++ b/source/encoder/frameencoder.h	Wed Sep 24 13:17:23 2014 -0700
>>>> @@ -159,7 +159,7 @@
>>>>       uint32_t*                m_substreamSizes;
>>>>       NoiseReduction*          m_nr;
>>>>       NALList                  m_nalList;
>>>> -    ThreadLocalData          m_tld; /* for --no-wpp */
>>>> +    ThreadLocalData*         m_tld; /* for --no-wpp */
>>>>       int                      m_filterRowDelay;
>>>>       int                      m_filterRowDelayCus;
>>>> _______________________________________________
>>>> x265-devel mailing list
>>>> x265-devel at videolan.org
>>>> https://mailman.videolan.org/listinfo/x265-devel
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel



More information about the x265-devel mailing list