[x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

dave dtyx265 at gmail.com
Tue Mar 10 19:09:49 CET 2015


>> This produces some interesting numbers.
sorry, I mixed these two up.
>>>> incorrect:Without using registers for constants
>>>> with using registers
>>>> x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%)
>>>>
>>>> encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s
>>>>
>>>> incorrect:With using registers for constants
>>>> without using registers
>>>> x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%)
>>>>
>>>> encoded 2000 frames in 93.10s (21.48 fps), 1008.63 kb/s
>>>>
>>>> I just added --cu-stats to the same command options that I used
>>>> previously and I ran it several times and got exactly the same
>>>> percentages.  Times varied by less than a second for each build.  So
>>>> how can simple register usage in one primitive affect intra pred
>>>> decisions?
>>> it shouldn't, the behavior must be wrong in one of the cases. no change
>>> in performance should be able to impact the encoder output (or any
>>> coding decisions)
>>>
>> So execution time isn't directly measured for decision making?
>>
>> The output is also different.
>>
>> ls -l bridge-close*
>> -rw-r--r-- 1 shakezula shakezula 8432204 Mar 10 09:25 bridge-close1.y4m
>> -rw-r--r-- 1 shakezula shakezula 8527219 Mar 10 07:49 bridge-close.y4m
>>
>>   bridge-close1.y4m was generated without the use of registers to hold
>> constants.
> yeah, definitely a bug in one of the two versions and if the testbench
> doesn't catch it that's really bad.
I am using the same source tree for both so the only differences is the 
register usage.

The unpatched tip, which is going to use c code for planar32, produces 
the same intra pred decision percentages as not using registers for 
constants but different encoded output.

x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%)

encoded 2000 frames in 101.82s (19.64 fps), 1008.64 kb/s

ls -l bridge-close.*
-rw-r--r-- 1 shakezula shakezula   8432239 Mar 10 10:03 bridge-close.hevc

The reconstructed output of all three looks the same.

Just to test for overflow I modified the testbench to test with all 
maximum 10-bit values of 0x3FF instead of random values and it passes.  
One more bit, 0x4FF, and it fails.  Though the y4m file has 8 bit depth.


More information about the x265-devel mailing list