[x265] [PATCH] inline mvcost() to reduce address operators
Steve Borho
steve at borho.org
Wed May 6 19:51:56 CEST 2015
On 05/07, chen wrote:
>
>
> At 2015-05-07 01:04:23,"Steve Borho" <steve at borho.org> wrote:
> >On 05/06, chen wrote:
> >>
> >>
> >> At 2015-05-06 08:38:24,"Steve Borho" <steve at borho.org> wrote:
> >> >On 05/05, Min Chen wrote:
> >> >> # HG changeset patch
> >> >> # User Min Chen <chenm003 at 163.com>
> >> >> # Date 1430862259 25200
> >> >> # Node ID 50ce2c0ddfbb743b45f678ee2e6b796762ad868f
> >> >> # Parent f32e6464225afa02983af1b1905f50cdccae5244
> >> >> inline mvcost() to reduce address operators
> >> >
> >> >I'm skeptical that this is a good idea. have you measured the difference
> >> >in performance with encoders built with profile-guided optimizations?
> >>
> >> I found this idea from vtune assembly report (preset ultrafast), it show bottleneck in signed address extendsion because offset is signed integer.
> >> In the ICL, when we use keyword 'restrict' can avoid part of these reduce operators.
> >>
> >> after this patch, I got 10% improve in ME module or call ~2% in total encode.
> >
> >I see, so this is bypassing the int->short->int signed int conversions.
> >In that case it's probably ok to explode the code this way.
>
> it convert short->int->long
>
> >
> >> >
> >> >> source/encoder/motion.cpp | 48 ++++++++++++++++++++++++++++++--------------
> >> >> 1 files changed, 33 insertions(+), 15 deletions(-)
> >> >>
> >> >> diff -r f32e6464225a -r 50ce2c0ddfbb source/encoder/motion.cpp
> >> >> --- a/source/encoder/motion.cpp Mon May 04 15:15:42 2015 -0500
> >> >> +++ b/source/encoder/motion.cpp Tue May 05 14:44:19 2015 -0700
> >> >> @@ -234,9 +234,14 @@
> >> >> pix_base + (m1x) + (m1y) * stride, \
> >> >> pix_base + (m2x) + (m2y) * stride, \
> >> >> stride, costs); \
> >> >> - (costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \
> >> >> - (costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \
> >> >> - (costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \
> >> >> + const uint16_t *base_mvx = &m_cost_mvx[(bmv.x + (m0x)) << 2]; \
> >> >> + const uint16_t *base_mvy = &m_cost_mvy[(bmv.y + (m0y)) << 2]; \
> >> >> + X265_CHECK(mvcost((bmv + MV(m0x, m0y)) << 2) == (base_mvx[((m0x) - (m0x)) << 2] + base_mvy[((m0y) - (m0y)) << 2]), "mvcost() check failure\n"); \
> >> >> + X265_CHECK(mvcost((bmv + MV(m1x, m1y)) << 2) == (base_mvx[((m1x) - (m0x)) << 2] + base_mvy[((m1y) - (m0y)) << 2]), "mvcost() check failure\n"); \
> >> >> + X265_CHECK(mvcost((bmv + MV(m2x, m2y)) << 2) == (base_mvx[((m2x) - (m0x)) << 2] + base_mvy[((m2y) - (m0y)) << 2]), "mvcost() check failure\n"); \
> >> >> + (costs)[0] += (base_mvx[((m0x) - (m0x)) << 2] + base_mvy[((m0y) - (m0y)) << 2]); \
> >> >> + (costs)[1] += (base_mvx[((m1x) - (m0x)) << 2] + base_mvy[((m1y) - (m0y)) << 2]); \
> >> >> + (costs)[2] += (base_mvx[((m2x) - (m0x)) << 2] + base_mvy[((m2y) - (m0y)) << 2]); \
> >> >> }
> >> >>
> >> >> #define COST_MV_PT_DIST_X4(m0x, m0y, p0, d0, m1x, m1y, p1, d1, m2x, m2y, p2, d2, m3x, m3y, p3, d3) \
> >> >> @@ -247,10 +252,10 @@
> >> >> fref + (m2x) + (m2y) * stride, \
> >> >> fref + (m3x) + (m3y) * stride, \
> >> >> stride, costs); \
> >> >> - costs[0] += mvcost(MV(m0x, m0y) << 2); \
> >> >> - costs[1] += mvcost(MV(m1x, m1y) << 2); \
> >> >> - costs[2] += mvcost(MV(m2x, m2y) << 2); \
> >> >> - costs[3] += mvcost(MV(m3x, m3y) << 2); \
> >> >> + (costs)[0] += mvcost(MV(m0x, m0y) << 2); \
> >> >> + (costs)[1] += mvcost(MV(m1x, m1y) << 2); \
> >> >> + (costs)[2] += mvcost(MV(m2x, m2y) << 2); \
> >> >> + (costs)[3] += mvcost(MV(m3x, m3y) << 2); \
> >
> >why was this one spared? you only added parents around the macro
> >argument.
>
> just keep same style to previous macro, we can ignore it
I meant why didn't you break out base_mvx, base_mvy here like you did
everywhere else?
--
Steve Borho
More information about the x265-devel
mailing list