[x264-devel] commit: Phenom CPU optimizations (Jason Garrett-Glaser )

git version control git at videolan.org
Mon Nov 24 19:24:44 CET 2008


x264 | branch: master | Jason Garrett-Glaser <darkshikari at gmail.com> | Fri Nov 21 03:39:11 2008 -0800| [f9dba8bb274dffb19394db20912823464efcb8e1] | committer: Jason Garrett-Glaser 

Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.

> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=f9dba8bb274dffb19394db20912823464efcb8e1
---

 Makefile                             |    2 +-
 common/cpu.c                         |    7 +++
 common/pixel.c                       |    5 ++
 common/x86/cpu-64.asm                |   51 -------------------
 common/x86/{cpu-32.asm => cpu-a.asm} |   45 ++++++++++++++---
 common/x86/mc-a.asm                  |   16 +++++-
 common/x86/mc-a2.asm                 |   17 ++++++-
 common/x86/mc-c.c                    |   11 ++++-
 common/x86/pixel.h                   |    1 +
 common/x86/sad-a.asm                 |   92 +++++++++++++++++++++++++++++++++-
 tools/checkasm.c                     |    9 +++-
 x264.h                               |    1 +
 12 files changed, 189 insertions(+), 68 deletions(-)

Diff:   http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=f9dba8bb274dffb19394db20912823464efcb8e1


More information about the x264-devel mailing list