[x264-devel] commit: Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT (Holger Lubitz )

git version control git at videolan.org
Sat Mar 7 04:08:32 CET 2009


x264 | branch: master | Holger Lubitz <holger at lubitz.org> | Fri Mar  6 18:16:30 2009 -0800| [2dca5f5413051a26cbba4e20f3c77ff69b694ba3] | committer: Jason Garrett-Glaser 

Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
Overall performance boost is up to ~15% on 64-bit Conroe.

> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=2dca5f5413051a26cbba4e20f3c77ff69b694ba3
---

 common/dct.c            |    5 +
 common/pixel.c          |   66 ++--
 common/x86/dct-32.asm   |  136 +++++--
 common/x86/dct-64.asm   |  215 +++++++---
 common/x86/dct-a.asm    |  189 ++++-----
 common/x86/dct.h        |    7 +
 common/x86/pixel-32.asm |   16 +-
 common/x86/pixel-a.asm  | 1054 ++++++++++++++++++++++++++++++----------------
 common/x86/pixel.h      |    8 +-
 common/x86/x86util.asm  |  260 +++++++++++-
 tools/checkasm.c        |    2 +-
 11 files changed, 1326 insertions(+), 632 deletions(-)

Diff:   http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=2dca5f5413051a26cbba4e20f3c77ff69b694ba3


More information about the x264-devel mailing list