[vlc-devel] Phosphor timing - forgot to mention

Juha Jeronen juha.jeronen at jyu.fi
Sat Mar 5 19:17:29 CET 2011

Hi again,

It says so in the code comments, but this was so curious that I thought
to mention it separately:

Even though the luma processing in DarkenField() is a trivial operation,
and both the C and MMX versions are vectorized, the MMX version is about
twice faster.

Timing both, I got 250us per frame with MMX and 500us without. The only
reason I can think of is that, in the MMX version, I preloaded the shift
and bitmask values into registers, so that only the actual picture data
needs to access the memory bus (or even L1 cache). This is assuming that
the C version doesn't automatically do a similar preload.


More information about the vlc-devel mailing list