[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8

Tue Aug 29 00:36:27 CEST 2006

Hi,

Loren Merritt a écrit :

[..]

> the C version does
>   row hadamard
>   column hadamard
>   sum
> 
> which is the same as
>   column hadamard
>   row hadamard
>   sum
> 
> which is the same as
>   column hadamard
>   transpose
>   column hadamard
>   transpose
>   sum
> 
> which is the same as
>   column hadamard
>   transpose
>   column hadamard
>   sum
> 
> which is the SSE version.

Ok, this makes sense.

Please find in attachment a yet another revision of my patchset.
It essentially fixes the problem I talked about in quant4x4dc (which was
exactly what Loren predicted), and adds an altivec implementation of
pixel_sa8d_8x8, which is disabled by default because it's broken.
I've gone through each line of my hadamard code converted from the C
code and couldn't find the mistake in it. I then implemented the
hadamard transform as the 2 macros used in the SSE2 code: SUMSUB_BADC
and HADAMARD1x8... Luckily, they seem to produce the same result (though
the result of the function is still false)...

So I imagine the problem lies in the abs+accumulation code, which seems
quite straightforward, and doesn't look like it could be that
error-prone... but it's late here, and so I guess I'll just sleep on it...

If anyone has an idea, please speak-up! :)

Cheers,

Guillaume

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Altivec_quant-dct_routines_6.diff
Type: text/x-patch
Size: 28250 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060829/cc110216/attachment.bin