[x264-devel] [patch] faster MMXEXT SATD
Christian Heine
sennindemokrit at gmx.net
Wed Sep 21 03:12:51 CEST 2005
Hi,
the attached patch contains new MMXEXT SATD functions that are slightly
faster than the original ones on Athlon XP.
The following table shows a direct comparison between the two versions.
Each function was run 128M times. The first column shows the minimum
number of clock ticks for the original SATD functions. The second column
shows the minimum number of clock ticks for the new version. The third
column shows the speed improvement in percent. Columns four to six show
the same thing, but based on the average clock ticks per function call.
Before calculating the average of the 128M measures, these were median
filtered with a filter length of 3.
4x4 66 65 1.54 68.00 67.01 1.48
8x4 114 104 9.62 117.01 108.50 7.85
4x8 114 102 11.76 114.99 104.49 10.05
8x8 209 185 12.97 214.00 190.52 12.33
16x8 412 341 20.82 414.00 368.23 12.43
8x16 411 337 21.96 415.00 345.03 20.28
16x16 814 648 25.62 816.00 665.95 22.53
The overall speed improvement was:
test system: Athlon-XP 3000+ WinXP/MinGW
test parameters: --bframes 2 --ref 8 --8x8dct --analyse all
test clip: 720x576 at 25fps 2050 frames
--me hex --subme 5 4.55%
--me umh --subme 5 3.57%
--me hex --subme 6 3.50%
--me umh --subme 6 2.82%
Changes:
Not too long ago (2005/09/08) Loren Merritt posted some thoughts about
the value range of the Hadamard 4x4 transformation. Based on that, I
concluded that it is possible to accumulate the sum of 8 4x4 blocks in
four unsigned words without the possibility of overflow. If the
rightshift by 1 is performed on each block and not at the end, it is
even possible to accumulate the sum of 16 4x4 blocks without overflow. I
implemented this, and the functions that profited the most, are the ones
that operate on large blocks. I also reduced the number of registers used.
Regards,
Christian Heine
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264rev293-satd-mmxext.diff.gz
Type: application/gzip
Size: 1842 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20050921/e5b19437/attachment.bin
More information about the x264-devel
mailing list