[x264-devel] Deblocking filter

Guy Bonneau gbonneau at matrox.com
Fri Jul 13 18:01:20 CEST 2007


Optimization is not a one time process. Improvement was done step
by step sometime over many years of development. Looking at the 
history of the deblocking assembler file will help you find how the
optimization evolved. 

A few weeks ago I went through the same process of understanding the
deblocking algorithm of x264 for academic purpose. The assembler code
was written to take advantage of mmx byte processing to speed up the
algorithm execution.

Here is some munging of the binary mathematic of byte processing for 
the p0' and q0' when bS is less than 4.

Let start from:

(((q0-p0)<<2) + (p1-q1) + 4) >> 3      (1)

The first 2 Least Significant Bit of result (p1-q1) doesn’t add to the
result. Thus they can be dropped. And we can rewrite the equation to:

(((q0-p0)) + ((p1-q1) >> 2) + 1) >> 1    (2)

If a and b are unsigned value we have the identity

(a-b) = a+(~b)+1 – 256   (Note a and b are unsigned value)

Thus we can rewrite (2) :

(((q0+~p0 + 1 - 256)) + ((p1+~q1 + 1 - 256) >> 2) + 1) >> 1

And trying to use PAVGB we can do some binary mathematic:
    
(((q0+~p0 + 1 - 256)) + ((p1+~q1 + 1) >> 2) - 64 + 1)  >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) >> 1) - 256 - 64 + 1)  >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 4 ) >> 1) - 256 - 64 – (4>>1) + 1) >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 3 + 1) >> 1) - 256 - 64 – (4>>1) +1) >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 3 + 1) >> 1) - 256 - 64 – 2 + 1) >> 1
(((q0+~p0 + 1)) + PAVGB(PAVGB(p1,~q1), 3) - 256 - 64 – 2 + 1)  >> 1
(((q0+~p0 + 1)) + PAVGB(PAVGB(p1,~q1), 3) + 1) >> 1 - 128 – 33
(((q0+~p0 + 1)) >> 1) + (PAVGB(PAVGB(p1,~q1), 3) + 1) >> 1)  - 161 
PAVGB(q0,~p0) + (PAVGB(PAVGB(p1,~q1), 3) + 1) >> 1) - 161

At that point we know that we have a problem. The expression PAVGB(q0,~p0)
will drop the Least Significant Bit of (q0+~p0+1) that should be added to
the second part of the equation: (PAVGB(PAVGB(p1,~q1), 3) + 1. And this will
cause an imprecision of 1 bit in the computing. To solve this problem we
need to add the value of the Least Significant Bit of (q0+~p0+1)
to the second part of the equation. Let name this value avglsb. We then
have:

PAVGB(q0,~p0) + (avglsb + (PAVGB(PAVGB(p1,~q1), 3) + 1) >> 1) - 161
 
avglsb is ((q0^p0 ) & 0x1)

Then

PAVGB(q0,~p0) + (((q0^p0 )& 0x1) + (PAVGB(PAVGB(p1,~q1), 3) + 1) >> 1) - 161
PAVGB(q0,~p0) + PAVGB(((q0^p0 ) & 0x1), (PAVGB(PAVGB(p1,~q1), 3)))  - 161

This is what the assembler code implements to compute p0' and q0' with some
clipping code needed. Keep in mind that the optimized code was written to
use byte processing like Loren said.

Hope this help.

BTW the deblocking optimization of x264 is probably one of the most 
beautiful piece of optimizing code I ever saw. Great Work!

Guy Bonneau



>-----Original Message-----
>From: x264-devel-bounces at videolan.org [mailto:x264-devel-
>bounces at videolan.org] On Behalf Of Jean-Michel HAUTBOIS
>Sent: Friday, July 13, 2007 3:52 AM
>To: x264-devel at videolan.org
>Subject: [x264-devel] Deblocking filter
>
>Hi everyone !
>I am currently looking at the deblocking filter algorithm in H.264, and
>am trying to understand your implementation. You have written the filter
>entirely in assembly language, but how did you proceed for optimizing ?
>Did you use some papers ?
>If so, could you please give me the references you used ?
>
>Thanks in advance for your advices.
>Best regards.
>_______________________________________________
>x264-devel mailing list
>x264-devel at videolan.org
>http://mailman.videolan.org/listinfo/x264-devel

_______________________________________________
x264-devel mailing list
x264-devel at videolan.org
http://mailman.videolan.org/listinfo/x264-devel


More information about the x264-devel mailing list