[x264-devel] [patch] Microoptimize assembler code (1/n)

Josef Zlomek josef.zlomek at xeris.cz
Mon Apr 25 09:15:34 CEST 2005


Hi,

I have made the x264_pixel_satd_* functions to precompute
the values 3*stride1 and 3*stride2 into registers on AMD64
and to use this precomputed values in LOAD_DIFF_INC_4x4 macro.

The result is that the base address does not have to be increased twice
in LOAD_DIFF_INC_4x4 macro, which removes several instructions and
depencencies between instructions.
Since the LOAD_DIFF_INC_4x4 sequence is used many times in hot functions
the result is a (minor) speedup.

The patch is located at http://zlomek.jikos.cz/x264/x264-stride.patch

Josef

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list