[x264-devel] ppc: Use a single store to write the scores for sad_x4_8x8

Luca Barbato git at videolan.org
Tue Mar 12 19:31:55 CET 2019


x264 | branch: master | Luca Barbato <lu_zero at gentoo.org> | Sun Aug 19 17:27:54 2018 +0200| [18262ee37fedeb4d7b30d9a228f2f38ef0e13cc1] | committer: Anton Mitrofanov

ppc: Use a single store to write the scores for sad_x4_8x8

Yet another use of xxpermdi, another 10% gain.

> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=18262ee37fedeb4d7b30d9a228f2f38ef0e13cc1
---

 common/ppc/pixel.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/common/ppc/pixel.c b/common/ppc/pixel.c
index 66d7e045..35dd934e 100644
--- a/common/ppc/pixel.c
+++ b/common/ppc/pixel.c
@@ -1108,20 +1108,11 @@ static void pixel_sad_x4_8x8_altivec( uint8_t *fenc,
     sum2v = vec_sums( sum2v, zero_s32v );
     sum3v = vec_sums( sum3v, zero_s32v );
 
-    sum0v = vec_splat( sum0v, 3 );
-    sum1v = vec_splat( sum1v, 3 );
-    sum2v = vec_splat( sum2v, 3 );
-    sum3v = vec_splat( sum3v, 3 );
+    vec_s32_t s01 = vec_mergel( sum0v, sum1v );
+    vec_s32_t s23 = vec_mergel( sum2v, sum3v );
+    vec_s32_t s = xxpermdi( s01, s23, 3 );
 
-    vec_ste( sum0v, 0, &sum0);
-    vec_ste( sum1v, 0, &sum1);
-    vec_ste( sum2v, 0, &sum2);
-    vec_ste( sum3v, 0, &sum3);
-
-    scores[0] = sum0;
-    scores[1] = sum1;
-    scores[2] = sum2;
-    scores[3] = sum3;
+    vec_vsx_st( s, 0, scores );
 }
 
 static void pixel_sad_x3_8x8_altivec( uint8_t *fenc, uint8_t *pix0,



More information about the x264-devel mailing list