[x265] [PATCH] Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances

dave dtyx265 at gmail.com
Thu Sep 25 02:19:53 CEST 2014


On 09/24/2014 02:07 PM, Steve Borho wrote:
> On 09/24, dtyx265 at gmail.com wrote:
>> # HG changeset patch
>> # User David T Yuen <dtyx265 at gmail.com>
>> # Date 1411589843 25200
>> # Node ID fc82666c3f8fe258e99ffbb8398ae04fd90a4bee
>> # Parent  b2b7072ddbf73085d457bd6a71bca946e505dea8
>> Changed FrameEncoder::m_tld to a pointer and set it to one of Encoder's ThreadLocalData instances.
>>
>> This uses less memory since m_tld isn't used in --wpp and Encoder's ThreadLocalData instances are not used in --no-wpp
>> Also there was a small performance increase on my system
>>
>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/encoder.cpp
>> --- a/source/encoder/encoder.cpp	Wed Sep 24 11:48:15 2014 +0530
>> +++ b/source/encoder/encoder.cpp	Wed Sep 24 13:17:23 2014 -0700
>> @@ -102,15 +102,20 @@
>>       m_scalingList.setupQuantMatrices();
>>   
>>       /* Allocate thread local data shared by all frame encoders */
>> -    ThreadPool *pool = ThreadPool::getThreadPool();
>> -    const int poolThreadCount = pool ? pool->getThreadCount() : 1;
>> -    m_threadLocalData = new ThreadLocalData[poolThreadCount];
>> -    for (int i = 0; i < poolThreadCount; i++)
>> +    const int poolThreadCount = ThreadPool::getThreadPool()->getThreadCount();
>> +    int numLocalData = m_param->frameNumThreads;
>> +    if (m_param->bEnableWavefront)
>> +        numLocalData = poolThreadCount;
> The general concept is sound, I think, but there could be edge cases
> where the number of frame threads is greater than the number of
> allocated pool threads.  Perhaps:
>
>         const int numLocalData = X265_MAX(poolThreadCount, m_param->frameNumThreads);
I tested this for all presets, --wpp and --no-wpp with no problems.

At first I tried something like your suggestion.

if (numLocalData < poolThreadCount) then numLocalData = poolThreadCount;

But then you may have idle ThreadLocalData laying around.

Either way, I thought I could force an edge case by playing around with 
the number of pool threads and framethreads but I couldn't force a 
segfault due to a FrameEncoder not having a ThreadLocalData object.  
Playing with the number of threads and framethreads did alter hevc output.

By the way, setting an excessive number of framethreads hurts 
performance as stated in the cli documentation.. so why allow 
framethreads > than cpucount()

>
>> +    m_threadLocalData = new ThreadLocalData[numLocalData];
>> +    for (int i = 0; i < numLocalData; i++)
>>       {
>>           m_threadLocalData[i].cuCoder.initSearch(m_param, m_scalingList);
>>           m_threadLocalData[i].cuCoder.create(g_maxCUDepth + 1, g_maxCUSize);
>>       }
>>   
>> +    for (int i = 0; i < m_param->frameNumThreads; i++)
>> +        m_frameEncoder[i].m_tld = &m_threadLocalData[i];
>> +
>>       m_lookahead = new Lookahead(m_param, m_threadPool, this);
>>       m_dpb = new DPB(m_param);
>>       m_rateControl = new RateControl(m_param);
>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/frameencoder.cpp
>> --- a/source/encoder/frameencoder.cpp	Wed Sep 24 11:48:15 2014 +0530
>> +++ b/source/encoder/frameencoder.cpp	Wed Sep 24 13:17:23 2014 -0700
>> @@ -105,9 +105,6 @@
>>           m_pool = NULL;
>>       }
>>   
>> -    m_tld.cuCoder.initSearch(m_param, top->m_scalingList);
>> -    m_tld.cuCoder.create(g_maxCUDepth + 1, g_maxCUSize);
>> -
>>       m_frameFilter.init(top, this, numRows);
>>   
>>       // initialize HRD parameters of SPS
>> @@ -470,7 +467,7 @@
>>               }
>>           }
>>   
>> -        m_tld.cuCoder.loadCTUData(cu);
>> +        m_tld->cuCoder.loadCTUData(cu);
> hmm; I wonder if this line is even necessary; but that is perhaps a
> separate issue from this patch.
This is what I just started working on
1.  Create the table to replace Analysis::getDepthScanIdx()
2.  Clean up loadCTUData as much as possible
3.  Move loadCTUData to TComDataCU since it has nothing to do with 
Analysis other than calling getDepthScanIdx() and even that is just an 
independent function which just happens to be in Analysis.
>
>>           // final coding (bitstream generation) for this CU
>>           m_entropyCoder.encodeCTU(cu);
>>   
>> @@ -581,8 +578,8 @@
>>       const int realRow = row >> 1;
>>       const int typeNum = row & 1;
>>   
>> -    ThreadLocalData& tld = threadId >= 0 ? m_top->m_threadLocalData[threadId] : m_tld;
>> -
>> +    ThreadLocalData& tld = threadId >= 0 ? m_top->m_threadLocalData[threadId] : *m_tld;
>> +
>>       if (!typeNum)
>>           processRowEncoder(realRow, tld);
>>       else
>> diff -r b2b7072ddbf7 -r fc82666c3f8f source/encoder/frameencoder.h
>> --- a/source/encoder/frameencoder.h	Wed Sep 24 11:48:15 2014 +0530
>> +++ b/source/encoder/frameencoder.h	Wed Sep 24 13:17:23 2014 -0700
>> @@ -159,7 +159,7 @@
>>       uint32_t*                m_substreamSizes;
>>       NoiseReduction*          m_nr;
>>       NALList                  m_nalList;
>> -    ThreadLocalData          m_tld; /* for --no-wpp */
>> +    ThreadLocalData*         m_tld; /* for --no-wpp */
>>   
>>       int                      m_filterRowDelay;
>>       int                      m_filterRowDelayCus;
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel



More information about the x265-devel mailing list