[vlc-devel] Phosphor timing - forgot to mention
    Juha Jeronen 
    juha.jeronen at jyu.fi
       
    Sat Mar  5 19:17:29 CET 2011
    
    
  
Hi again,
It says so in the code comments, but this was so curious that I thought
to mention it separately:
Even though the luma processing in DarkenField() is a trivial operation,
and both the C and MMX versions are vectorized, the MMX version is about
twice faster.
Timing both, I got 250us per frame with MMX and 500us without. The only
reason I can think of is that, in the MMX version, I preloaded the shift
and bitmask values into registers, so that only the actual picture data
needs to access the memory bus (or even L1 cache). This is assuming that
the C version doesn't automatically do a similar preload.
 -J
    
    
More information about the vlc-devel
mailing list