[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8
Guillaume POIRIER
gpoirier at mplayerhq.hu
Tue Aug 29 00:36:27 CEST 2006
Hi,
Loren Merritt a écrit :
[..]
> the C version does
> row hadamard
> column hadamard
> sum
>
> which is the same as
> column hadamard
> row hadamard
> sum
>
> which is the same as
> column hadamard
> transpose
> column hadamard
> transpose
> sum
>
> which is the same as
> column hadamard
> transpose
> column hadamard
> sum
>
> which is the SSE version.
Ok, this makes sense.
Please find in attachment a yet another revision of my patchset.
It essentially fixes the problem I talked about in quant4x4dc (which was
exactly what Loren predicted), and adds an altivec implementation of
pixel_sa8d_8x8, which is disabled by default because it's broken.
I've gone through each line of my hadamard code converted from the C
code and couldn't find the mistake in it. I then implemented the
hadamard transform as the 2 macros used in the SSE2 code: SUMSUB_BADC
and HADAMARD1x8... Luckily, they seem to produce the same result (though
the result of the function is still false)...
So I imagine the problem lies in the abs+accumulation code, which seems
quite straightforward, and doesn't look like it could be that
error-prone... but it's late here, and so I guess I'll just sleep on it...
If anyone has an idea, please speak-up! :)
Cheers,
Guillaume
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Altivec_quant-dct_routines_6.diff
Type: text/x-patch
Size: 28250 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060829/cc110216/attachment.bin
More information about the x264-devel
mailing list