[x264-devel] Patch - Altivec Quant 4x4x4

Philipp Sibler philipp.sibler at googlemail.com
Thu Oct 3 23:10:04 CEST 2013


Hi Derek,

see my answers below.

Philipp


>> The PPC brotherhood still exists, I see!

Yeah, small and secret, but there were are ..!

>> Overall or just quantization?

Overall.

>> x264 doesn't allow tabs.

See attached file for a tab-free patch.

>> I would wonder if this is optimal, but it's still a gain, and
>> probably nobody else will write Altivec code... so it looks
>> pretty good to me then.

I agree, writing the Altivec code for this function from scratch may produce even better results. However, the solution in the patch relies on the existing, "field-proven" Altivec code and profiling results suggest that the code's performance might not be miles away from its optimum.

>> I assume this has been run through x264's regression testing.

The Python regression test script actually threw syntax errors (problem resolving digress class definitions) on OSX 10.5, with both Python2.5 and Python3.3 interpreters. I'm not a big Python guy so maybe someone could lend me a hand here.

So what I did with the patch was to

1) run a regression test with the reference decoder as described in doc/regression_text.txt. x264's YUV dumpfile and the decoder outfile were identical.

2) run a test encode with city_4cif.y4m into a Matroska container. VLC played the compressed file without any problems or noticable artefacts.

But what I did notice when testing x264: Subpixel motion estimation doesn't seem to work on PowerPCs on the latest master branch version. The only way to get x264 working was to disable the estimation (--subme 0). Without this flag you simply get a "Bus error" and x264 exits.




Am 02.10.2013 13:56, schrieb Derek Buitenhuis:
> On 9/29/2013 5:23 PM, Philipp Sibler wrote:
>> Hi x264,
>>
>> this patch introduces an Altivec version of the 4x4x4 quantization step.
>> On the current master branch the 4x4x4 quantization on PowerPC Altivec
>> machines defaults to the plain scalar C routine.
> The PPC brotherhood still exists, I see!
>
>> Patch was tested on a PowerMac G4 and generates an encoding speedup of
>> about 14 percent there.
> Overall or just quantization?
>
>> >From 951350060c745c1c33bf814f87621e2763143a50 Mon Sep 17 00:00:00 2001
>> From: Philipp Sibler <philipp.sibler at gmail.com>
>> Date: Sun, 29 Sep 2013 17:48:36 +0200
>> Subject: [PATCH] Introduced Altivec version of quant 4x4x4
> s/Introduced/Introduce/
>
>> ---
>>   common/ppc/quant.c |   17 +++++++++++++++++
>>   common/ppc/quant.h |    1 +
>>   common/quant.c     |    1 +
>>   3 files changed, 19 insertions(+), 0 deletions(-)
>>
>> diff --git a/common/ppc/quant.c b/common/ppc/quant.c
>> index f11938a..0fff340 100644
>> --- a/common/ppc/quant.c
>> +++ b/common/ppc/quant.c
>> @@ -90,6 +90,23 @@ int x264_quant_4x4_altivec( int16_t dct[16], uint16_t mf[16], uint16_t bias[16]
>>       return vec_any_ne(nz, zero_s16v);
>>   }
>>   
>> +int x264_quant_4x4x4_altivec( int16_t dct[4][16], uint16_t mf[16], uint16_t bias[16] )
>> +{
>> +    int nza = 0;
>> +    int nz = 0;
>> +
>> +	nz = x264_quant_4x4_altivec(dct[0], mf, bias);
>> +	nza |= (!!nz);
>> +	nz = x264_quant_4x4_altivec(dct[1], mf, bias);
>> +	nza |= (!!nz)<<1;
>> +	nz = x264_quant_4x4_altivec(dct[2], mf, bias);
>> +	nza |= (!!nz)<<2;
>> +	nz = x264_quant_4x4_altivec(dct[3], mf, bias);
>> +	nza |= (!!nz)<<3;
> x264 doesn't allow tabs.
>
> I would wonder if this is optimal, but it's still a gain, and
> probably nobody else will write Altivec code... so it looks
> pretty good to me then.
>                                                                   \
>>           pf->quant_8x8 = x264_quant_8x8_altivec;
>> +		pf->quant_4x4x4 = x264_quant_4x4x4_altivec;
> Tabs again.
>
> I assume this has been run through x264's regression testing.
>
> - Derek
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> https://mailman.videolan.org/listinfo/x264-devel

-------------- next part --------------
>From 224ccb927b55b6d684b6240d8147e40525b6be2a Mon Sep 17 00:00:00 2001
From: Philipp Sibler <philipp.sibler at gmail.com>
Date: Thu, 3 Oct 2013 22:50:37 +0200
Subject: [PATCH] Introduces an Altivec version of quant 4x4x4

---
 common/ppc/quant.c |   17 +++++++++++++++++
 common/ppc/quant.h |    1 +
 common/quant.c     |    1 +
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/common/ppc/quant.c b/common/ppc/quant.c
index f11938a..85245ef 100644
--- a/common/ppc/quant.c
+++ b/common/ppc/quant.c
@@ -90,6 +90,23 @@ int x264_quant_4x4_altivec( int16_t dct[16], uint16_t mf[16], uint16_t bias[16]
     return vec_any_ne(nz, zero_s16v);
 }
 
+int x264_quant_4x4x4_altivec( int16_t dct[4][16], uint16_t mf[16], uint16_t bias[16] )
+{
+    int nza = 0;
+    int nz = 0;
+
+    nz = x264_quant_4x4_altivec(dct[0], mf, bias);
+    nza |= (!!nz);
+    nz = x264_quant_4x4_altivec(dct[1], mf, bias);
+    nza |= (!!nz)<<1;
+    nz = x264_quant_4x4_altivec(dct[2], mf, bias);
+    nza |= (!!nz)<<2;
+    nz = x264_quant_4x4_altivec(dct[3], mf, bias);
+    nza |= (!!nz)<<3;
+    
+    return nza;
+}
+
 // DC quant of a whole 4x4 block, unrolled 2x and "pre-scheduled"
 #define QUANT_16_U_DC( idx0, idx1 )                                 \
 {                                                                   \
diff --git a/common/ppc/quant.h b/common/ppc/quant.h
index 1f789c3..2f22d91 100644
--- a/common/ppc/quant.h
+++ b/common/ppc/quant.h
@@ -27,6 +27,7 @@
 #define X264_PPC_QUANT_H
 
 int x264_quant_4x4_altivec( int16_t dct[16], uint16_t mf[16], uint16_t bias[16] );
+int x264_quant_4x4x4_altivec( int16_t dct[4][16], int16_t mf[16], int16_t bias[16] );
 int x264_quant_8x8_altivec( int16_t dct[64], uint16_t mf[64], uint16_t bias[64] );
 
 int x264_quant_4x4_dc_altivec( int16_t dct[16], int mf, int bias );
diff --git a/common/quant.c b/common/quant.c
index 7aa851e..a5c5f7a 100644
--- a/common/quant.c
+++ b/common/quant.c
@@ -717,6 +717,7 @@ void x264_quant_init( x264_t *h, int cpu, x264_quant_function_t *pf )
         pf->quant_4x4_dc = x264_quant_4x4_dc_altivec;
         pf->quant_4x4 = x264_quant_4x4_altivec;
         pf->quant_8x8 = x264_quant_8x8_altivec;
+        pf->quant_4x4x4 = x264_quant_4x4x4_altivec;
 
         pf->dequant_4x4 = x264_dequant_4x4_altivec;
         pf->dequant_8x8 = x264_dequant_8x8_altivec;
-- 
1.7.7.4



More information about the x264-devel mailing list