[x264-devel] commit: Phenom CPU optimizations (Jason Garrett-Glaser )
git version control
git at videolan.org
Mon Nov 24 19:24:44 CET 2008
x264 | branch: master | Jason Garrett-Glaser <darkshikari at gmail.com> | Fri Nov 21 03:39:11 2008 -0800| [f9dba8bb274dffb19394db20912823464efcb8e1] | committer: Jason Garrett-Glaser
Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=f9dba8bb274dffb19394db20912823464efcb8e1
---
Makefile | 2 +-
common/cpu.c | 7 +++
common/pixel.c | 5 ++
common/x86/cpu-64.asm | 51 -------------------
common/x86/{cpu-32.asm => cpu-a.asm} | 45 ++++++++++++++---
common/x86/mc-a.asm | 16 +++++-
common/x86/mc-a2.asm | 17 ++++++-
common/x86/mc-c.c | 11 ++++-
common/x86/pixel.h | 1 +
common/x86/sad-a.asm | 92 +++++++++++++++++++++++++++++++++-
tools/checkasm.c | 9 +++-
x264.h | 1 +
12 files changed, 189 insertions(+), 68 deletions(-)
Diff: http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=f9dba8bb274dffb19394db20912823464efcb8e1
More information about the x264-devel
mailing list