[x265] SSE4 Angular Mode 26 Intra function

Matt Johnson johnso87 at illinois.edu
Sun Dec 22 15:28:50 CET 2013


Hi Min,
	Unfortunately the code where I'm seeing the failure is part of a whole 
separate program I wrote that makes use of x265's intra prediction and 
SATD functions, so I don't have a minimal test case.  Essentially I am 
decomposing the frame into a regular grid of 4x4, .. 32x32 PUs, and 
computing SATD values for all 35 intra prediction modes, for all blocks, 
for all block sizes, for all 3 channels, using the original frame pixels 
as the reference rather than the reconstructed pixels as the decoder 
would see them.  It's intended to be a way to quickly find the most 
profitable intra prediction modes for high-bitrate scenarios where 
reconstructed pixels approximately equal the original pixels; by using 
the original pixels as reference, I can do prediction for the entire 
frame at once, so it is profitable to offload the computation to a GPU.
	In any case, I do call the intra_pred_allangs function pointer of my 
primitives struct with bFilter==1 for luma blocks under 32x32 and 
bFilter==0 otherwise.  As an example of when the filtering clause gives 
an incorrect result, consider a 4x4 chroma block.

Above neighbor array = [0x7B 0x7B 0x7B 0x7B 0x7B ...] (first element is 
[-1][-1] index in the image, relative to the block in question (top-left 
corner)
Left neighbor array = [0x7B 0x7B 0x7A 0x7B 0x7C ...] (first element is 
also [-1][-1] index in the image)

For the vertical prediction mode on 4x4 chroma, you just copy the above 
neighbors downward, so for example, the element in the second row and 
the first column ([1][0] in row-major syntax, [0][1] in the standard) is 
0x7B according to the standard.  With the filtering clause, that same 
element is calculated as 0x7B+((0x7A - 0x7B) >> 1) = 0x7A.

Whether this needs to be fixed probably depends on whether x265 ever 
intends on doing intra prediction for chroma.  If not, then I should 
just pass "--cpuid 1" to use the serial C versions.

Thanks!
-Matt

On 12/22/2013 12:07 AM, chen wrote:
> Hello Matt,
> Could you tell me how to reproduce the hash mistake?
> I check the code again, the intra_pred_allangs() called only Luma path.
>
> Thanks,
>
> Min
>
> At 2013-12-22 08:35:09,"Matt Johnson" <johnso87 at illinois.edu> wrote:
>>Hi all,
>>	I don't know x86 assembly well enough to easily diagnose the problem
>>myself, but I'm running into a problem with intra prediction in the
>>horizontal (mode 10) and vertical (mode 26) modes, where the SSE4 result
>>(--cpuid 255) mismatches the C result (--cpuid 1) and indeed any cpuid
>>value earlier than SSE4.
>>	The problem seems to be with the filtering clause (Equation 8-54 in the
>>standard for mode 26, 8-62 for mode 10), which applies to 4x4, 8x8, and
>>16x16 luma blocks.  I'm seeing the problem with 4x4 chroma blocks; it
>>looks like the C version respects the bLuma flag to all_angs_pred_c()
>>(which propagates to the bFilter argument to intra_pred_ang_c()), so the
>>filtering clause is not invoked for 4x4 chroma blocks and the normal
>>equations involving ref[], iIdx, and iFact come into play.  It looks
>>like the SSE4 version doesn't implement that flag the same way; the
>>predicted pixels I'm getting back are consistent with the use of the
>>filtering clause.
>>
>>Thanks,
>>Matt
>>_______________________________________________
>>x265-devel mailing list
>>x265-devel at videolan.org
>>https://mailman.videolan.org/listinfo/x265-devel
>
>
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>


More information about the x265-devel mailing list