[x264-devel] commit: Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT (Holger Lubitz )
git version control
git at videolan.org
Sat Mar 7 04:08:32 CET 2009
x264 | branch: master | Holger Lubitz <holger at lubitz.org> | Fri Mar 6 18:16:30 2009 -0800| [2dca5f5413051a26cbba4e20f3c77ff69b694ba3] | committer: Jason Garrett-Glaser
Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
Overall performance boost is up to ~15% on 64-bit Conroe.
> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=2dca5f5413051a26cbba4e20f3c77ff69b694ba3
---
common/dct.c | 5 +
common/pixel.c | 66 ++--
common/x86/dct-32.asm | 136 +++++--
common/x86/dct-64.asm | 215 +++++++---
common/x86/dct-a.asm | 189 ++++-----
common/x86/dct.h | 7 +
common/x86/pixel-32.asm | 16 +-
common/x86/pixel-a.asm | 1054 ++++++++++++++++++++++++++++++----------------
common/x86/pixel.h | 8 +-
common/x86/x86util.asm | 260 +++++++++++-
tools/checkasm.c | 2 +-
11 files changed, 1326 insertions(+), 632 deletions(-)
Diff: http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=2dca5f5413051a26cbba4e20f3c77ff69b694ba3
More information about the x264-devel
mailing list