[x265] patch for faster intra
dave
dtyx265 at gmail.com
Mon Aug 4 17:45:08 CEST 2014
On 08/03/2014 07:52 PM, Steve Borho wrote:
> On 08/02, dave wrote:
>> A few other things...
>>
>> In my testing of encoding a single frame where I only examined processing of
>> the first few CUs this search method always found the lowest cost but cost
>> values never followed a consistent curve with a single low point. Without
>> further testing or deeper knowledge of the angle intra modes I wouldn't
>> guarantee the lowest cost will always be found.
> It's understood there will be a trade-off between compression and
> performance; so this is ok. It would be interesting to measure the
> average "error" or difference in cost between the lowest cost found by
> the fast-scan and the lowest cost found by the exhaustive scan. If
> nothing else we can advertise this number in the documentation for the
> option.
>
>> Also, unfortunately my system is old and doesn't support sse4 and that is
>> the only level of assembler supported for intra mode predictions so I was
>> only able to develop and test for the c primitives. I need to upgrade...
> that's unfortunate; x265 really prefers SSE4.
>
>> Yes, there was a small performance increase that seemed larger when encoding
>> a single frame than a short video. Neither produced consistent results on
>> my system but the new search method was slightly faster most of the time
>> when encoding a single frame and always faster when encoding a short video.
> Looking at the code, I had forgotten about the details of picking
> filtered or unfiltered sources based on the size and angle. This is
> something else that the 'all-angs' function does for you implicitly.
all-angs uses a lookup table in intrapred.cpp to determine filtering.
If the table were more publicly available that would help speed things
up. Also, the table and Predict::filteringIntraReferenceSamples don't
return the same results for all possible values.
> I'm pretty sure the fast scan would be faster if it ran the 'all-angs'
> function and then only measured satd/sa8d in the same pattern you have
> now. The only gotchya will be to make sure you compare with transposed
> source pixels for those modes that require it.
>
>> On 08/01/2014 09:50 PM, Steve Borho wrote:
>>> On 08/01, dave wrote:
>>>> I am submitting a patch to implement the faster intra search suggested by
>>>> Steve Borho here:
>>>>
>>>> https://mailman.videolan.org/pipermail/x265-devel/2014-July/004873.html
>>> nice!
>>>
>>>> The patch is implements the faster search in slicetype.cpp. The same
>>>> approach can also be used in analysis.cpp but it's a little more than a
>>>> simple cut and paste though it shouldn't take long.
>>> yep
>>>
>>>> TEncSearch.cpp also calls intra_pred_allangs for which this is the faster
>>>> alternative but at first look, TEncSearch.cpp is more complex. Depending on
>>>> what is desired, either this search could greatly simplify this part of
>>>> TEncSearch.cpp or it might not be applicable to what TEncSearch.cpp is
>>>> doing. I will be looking into it.
>>> it should also apply here; this function is a little more complicated
>>> because it keeps a "best N" list of modes, and then performs
>>> rate-distortion measurements (encodes each option and records the actual
>>> distortion and bit cost) to select the final intra mode. The same 'fast
>>> scan' mode could be used to build the 'best N' list. But I imagine most
>>> presets that use this RDO version of intra will not want a fast scan.
>>>
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
More information about the x265-devel
mailing list