[x265] SSE4 Angular Mode 26 Intra function
Matt Johnson
johnso87 at illinois.edu
Sun Dec 22 15:28:50 CET 2013
Hi Min,
Unfortunately the code where I'm seeing the failure is part of a whole
separate program I wrote that makes use of x265's intra prediction and
SATD functions, so I don't have a minimal test case. Essentially I am
decomposing the frame into a regular grid of 4x4, .. 32x32 PUs, and
computing SATD values for all 35 intra prediction modes, for all blocks,
for all block sizes, for all 3 channels, using the original frame pixels
as the reference rather than the reconstructed pixels as the decoder
would see them. It's intended to be a way to quickly find the most
profitable intra prediction modes for high-bitrate scenarios where
reconstructed pixels approximately equal the original pixels; by using
the original pixels as reference, I can do prediction for the entire
frame at once, so it is profitable to offload the computation to a GPU.
In any case, I do call the intra_pred_allangs function pointer of my
primitives struct with bFilter==1 for luma blocks under 32x32 and
bFilter==0 otherwise. As an example of when the filtering clause gives
an incorrect result, consider a 4x4 chroma block.
Above neighbor array = [0x7B 0x7B 0x7B 0x7B 0x7B ...] (first element is
[-1][-1] index in the image, relative to the block in question (top-left
corner)
Left neighbor array = [0x7B 0x7B 0x7A 0x7B 0x7C ...] (first element is
also [-1][-1] index in the image)
For the vertical prediction mode on 4x4 chroma, you just copy the above
neighbors downward, so for example, the element in the second row and
the first column ([1][0] in row-major syntax, [0][1] in the standard) is
0x7B according to the standard. With the filtering clause, that same
element is calculated as 0x7B+((0x7A - 0x7B) >> 1) = 0x7A.
Whether this needs to be fixed probably depends on whether x265 ever
intends on doing intra prediction for chroma. If not, then I should
just pass "--cpuid 1" to use the serial C versions.
Thanks!
-Matt
On 12/22/2013 12:07 AM, chen wrote:
> Hello Matt,
> Could you tell me how to reproduce the hash mistake?
> I check the code again, the intra_pred_allangs() called only Luma path.
>
> Thanks,
>
> Min
>
> At 2013-12-22 08:35:09,"Matt Johnson" <johnso87 at illinois.edu> wrote:
>>Hi all,
>> I don't know x86 assembly well enough to easily diagnose the problem
>>myself, but I'm running into a problem with intra prediction in the
>>horizontal (mode 10) and vertical (mode 26) modes, where the SSE4 result
>>(--cpuid 255) mismatches the C result (--cpuid 1) and indeed any cpuid
>>value earlier than SSE4.
>> The problem seems to be with the filtering clause (Equation 8-54 in the
>>standard for mode 26, 8-62 for mode 10), which applies to 4x4, 8x8, and
>>16x16 luma blocks. I'm seeing the problem with 4x4 chroma blocks; it
>>looks like the C version respects the bLuma flag to all_angs_pred_c()
>>(which propagates to the bFilter argument to intra_pred_ang_c()), so the
>>filtering clause is not invoked for 4x4 chroma blocks and the normal
>>equations involving ref[], iIdx, and iFact come into play. It looks
>>like the SSE4 version doesn't implement that flag the same way; the
>>predicted pixels I'm getting back are consistent with the use of the
>>filtering clause.
>>
>>Thanks,
>>Matt
>>_______________________________________________
>>x265-devel mailing list
>>x265-devel at videolan.org
>>https://mailman.videolan.org/listinfo/x265-devel
>
>
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
More information about the x265-devel
mailing list