[x265] [PATCH RFC] analysis: use macro and for-loop to simplify fast-intra

Steve Borho steve at borho.org
Fri Aug 15 06:10:36 CEST 2014


On 08/14, dave wrote:
> On 08/14/2014 05:02 PM, Steve Borho wrote:
> >On 08/14, dave wrote:
> >>On 08/14/2014 01:42 PM, Steve Borho wrote:
> >>># HG changeset patch
> >>># User Steve Borho <steve at borho.org>
> >>># Date 1408048681 18000
> >>>#      Thu Aug 14 15:38:01 2014 -0500
> >>># Node ID 07138e6ac952c96d1e31f5490c44f4cfaf6ac12a
> >>># Parent  213f17c1492c5bf96c3f382e7beffe0c871a563c
> >>>analysis: use macro and for-loop to simplify fast-intra
> >>>
> >>>this changes behavior a bit; it's trying both +/-1 offsets instead of just
> >>>one. and it has to do one extra check at the end since mode 34 isn't reached
> >>>by the other previous loops
> >>>
> >>>diff -r 213f17c1492c -r 07138e6ac952 source/encoder/analysis.cpp
> >>>--- a/source/encoder/analysis.cpp	Thu Aug 14 09:43:39 2014 -0700
> >>>+++ b/source/encoder/analysis.cpp	Thu Aug 14 15:38:01 2014 -0500
> >>>@@ -1693,68 +1693,56 @@
> >>>      bool modeHor;
> >>>      pixel *cmp;
> >>>      intptr_t srcStride;
> >>>+
> >>>+#define TRY_ANGLE(angle) \
> >>>+    modeHor = angle < 18; \
> >>>+    cmp = modeHor ? buf_trans : fenc; \
> >>>+    srcStride = modeHor ? scaleTuSize : scaleStride; \
> >>>+    sad = sa8d(cmp, srcStride, &tmp[(angle - 2) * predsize], scaleTuSize) << costShift; \
> >>>+    bits = (mpms & ((uint64_t)1 << angle)) ? xModeBitsIntra(cu, angle, partOffset, depth) : rbits; \
> >>>+    cost = m_rdCost.calcRdSADCost(sad, bits)
> >>>+
> >>>      if (m_param->bEnableFastIntra)
> >>>      {
> >>>-        int lowsad, highsad, asad = 0;
> >>>-        uint32_t lowbits, highbits, amode, lowmode, highmode, abits = 0;
> >>>-        uint64_t lowcost, highcost = MAX_INT64, acost = MAX_INT64;
> >>>+        int asad = 0;
> >>>+        uint32_t lowmode, highmode, amode, abits = 0;
> >>>+        uint64_t acost = MAX_INT64;
> >>>-        for (mode = 4;mode < 35; mode += 5)
> >>>+        /* pick the best angle, sampling at distance of 5 */
> >>>+        for (mode = 5; mode < 35; mode += 5)
> >Thanks for reviewing
> >
> >>By starting with mode = 5, won't this miss mode 2 since only +/-2 is
> >>checked?  By starting from 4 the loop should end at 34.
> >if 5 was the best angle of the initial sweep, we'll try +/- 2 (3 and
> >7). If 3 is the new best we try +/-1 which would be 2 and 4.
> >
> >On the high end of the spectrum; if 30 was the best cost, it will try
> >28 and 32, then 33 and 31.
> >
> >Starting with 4 would remove the need for the extra check at the end,
> >but at the same time we would need to range-check the low/high modes as
> >well, since it could reach mode 1 (planar) or modes above 34.
> 
> I understand now.  I am testing the original search method in
> TEncSearch::estIntraPredQT.  Would you prefer the new one there too?

I was just looking at it. I think the angular mode scan is the least of
its problems.

Currently it calculates the sa8d cost of all 35 modes and builds a
sorted list of the top N; where N is 3 except for block sizes 4x4 and
8x8, where N is 8.  Next it does full encodes of each of the top N and
then picks the one with the least RD cost.

What I would prefer for it to do would be to calculate all the sa8d
costs, keeping track of the best cost. Then in the second pass measure
the RDO cost of all modes that are within X% of the best cost or the
most-probable-mode. So the threshold is the sa8d cost delta instead of
an arbitrary count.

Then see my TODO comment about redundant work; the function needs some
serious attention in the RDO section.

-- 
Steve Borho


More information about the x265-devel mailing list