[vlc-devel] Phosphor timing tests (reworked += 5)
juha.jeronen at jyu.fi
Mon Mar 28 22:01:04 CEST 2011
Other parts haven't changed, so I'll concentrate only on the effect of
the new, faster MMX code in DarkenField() that was suggested by Laurent.
Timing the new version ("reworked += 5"), DarkenField() takes about
500us per call in 4:2:2 mode on my Atom. It used to take 750us, so the
simpler MMX code is a significant improvement. It also makes the code a
bit shorter and easier to read.
The performance scaled surprisingly linearly: 12 instructions vs. 23 per
loop produced a 100% speed increase in the 4:2:2 chroma handler. It
seems it was CPU bound.
The old version of DarkenField() took 250us for the luma and 500us for
the chroma. The new one seems to take 250us + 250us. The new load is
completely linear in the amount of input data, considering that in
4:2:2, each chroma plane has half the number of pixels of the luma
plane, and that there are two chroma planes in the picture. Thus optimal
load is expected to be 2x that of the luma handler, and it is reached
(within measurement error) with the new version.
The filter is still not fast enough to run 4:2:2 realtime on the Atom,
because ComposeFrame() in CC_UPCONVERT mode forms a bottleneck. Either
one of suggested one-pass strategies might help here, but as was agreed,
that is for later. The 4:2:0 modes work fine even on the Atom, and
CC_ALTLINE already fulfills the primary purpose of this filter.
That's it for today's testing :)
More information about the vlc-devel