[x264-devel] commit: Faster width4 SSD+SATD, SSE4 optimizations (Jason Garrett-Glaser )
git version control
git at videolan.org
Wed Nov 26 00:28:02 CET 2008
x264 | branch: master | Jason Garrett-Glaser <darkshikari at gmail.com> | Tue Nov 25 01:04:26 2008 -0800| [e1013e8152254614696bbc9d92959bc9705d98b1] | committer: Jason Garrett-Glaser
Faster width4 SSD+SATD, SSE4 optimizations
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
Use pinsrd (SSE4) for faster width4 SSD
Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
These optimizations help on Nehalem, Phenom, and Penryn CPUs.
> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=e1013e8152254614696bbc9d92959bc9705d98b1
---
common/cpu.c | 1 -
common/cpu.h | 1 +
common/pixel.c | 46 +++++++++++++++-------
common/x86/dct-a.asm | 6 +-
common/x86/deblock-a.asm | 2 +-
common/x86/mc-a.asm | 2 +-
common/x86/pixel-a.asm | 95 ++++++++++++++++++++++++++++++++++++++++------
common/x86/pixel.h | 1 +
common/x86/x86util.asm | 2 +-
tools/checkasm.c | 6 +++
10 files changed, 129 insertions(+), 33 deletions(-)
Diff: http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=e1013e8152254614696bbc9d92959bc9705d98b1
More information about the x264-devel
mailing list