>> The 8x8 doesn't such a big speed-up because the data is 8-bytes >> aligned, not 16-bytes aligned, so it's necessary to permute it before >> using it. Why not just set the alignment to 16 in frame.c? Dark Shikari