[x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

Steve Borho steve at borho.org
Tue Mar 10 17:36:45 CET 2015


On Tue, Mar 10, 2015 at 11:34 AM, dave <dtyx265 at gmail.com> wrote:
> On 03/10/2015 08:56 AM, Steve Borho wrote:
>>
>> On 03/10, dave wrote:
>>>
>>> On 03/09/2015 11:40 PM, Steve Borho wrote:
>>
>> <snip>
>>>>
>>>> No, but the command line option --cu-stats does show how much it is
>>>> called (but not how long it took)
>>>>
>>> This produces some interesting numbers.
>>>
>>> Without using registers for constants
>>>
>>> x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%)
>>>
>>> encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s
>>>
>>> With using registers for constants
>>>
>>> x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%)
>>>
>>> encoded 2000 frames in 93.10s (21.48 fps), 1008.63 kb/s
>>>
>>> I just added --cu-stats to the same command options that I used
>>> previously and I ran it several times and got exactly the same
>>> percentages.  Times varied by less than a second for each build.  So
>>> how can simple register usage in one primitive affect intra pred
>>> decisions?
>>
>> it shouldn't, the behavior must be wrong in one of the cases. no change
>> in performance should be able to impact the encoder output (or any
>> coding decisions)
>>
> So execution time isn't directly measured for decision making?
>
> The output is also different.
>
> ls -l bridge-close*
> -rw-r--r-- 1 shakezula shakezula 8432204 Mar 10 09:25 bridge-close1.y4m
> -rw-r--r-- 1 shakezula shakezula 8527219 Mar 10 07:49 bridge-close.y4m
>
>  bridge-close1.y4m was generated without the use of registers to hold
> constants.

yeah, definitely a bug in one of the two versions and if the testbench
doesn't catch it that's really bad.

-- 
Steve Borho


More information about the x265-devel mailing list