[x265] [PATCH 2 of 2] asm:intra pred planar32 sse2 high bit

Steve Borho steve at borho.org
Tue Mar 10 20:12:21 CET 2015


On 03/10, dave wrote:
> 
> >>This produces some interesting numbers.
> sorry, I mixed these two up.
> >>>>incorrect:Without using registers for constants
> >>>>with using registers
> >>>>x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%)
> >>>>
> >>>>encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s
> >>>>
> >>>>incorrect:With using registers for constants
> >>>>without using registers
> >>>>x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%)
> >>>>
> >>>>encoded 2000 frames in 93.10s (21.48 fps), 1008.63 kb/s
> >>>>
> >>>>I just added --cu-stats to the same command options that I used
> >>>>previously and I ran it several times and got exactly the same
> >>>>percentages.  Times varied by less than a second for each build.  So
> >>>>how can simple register usage in one primitive affect intra pred
> >>>>decisions?
> >>>it shouldn't, the behavior must be wrong in one of the cases. no change
> >>>in performance should be able to impact the encoder output (or any
> >>>coding decisions)
> >>>
> >>So execution time isn't directly measured for decision making?
> >>
> >>The output is also different.
> >>
> >>ls -l bridge-close*
> >>-rw-r--r-- 1 shakezula shakezula 8432204 Mar 10 09:25 bridge-close1.y4m
> >>-rw-r--r-- 1 shakezula shakezula 8527219 Mar 10 07:49 bridge-close.y4m
> >>
> >>  bridge-close1.y4m was generated without the use of registers to hold
> >>constants.
> >yeah, definitely a bug in one of the two versions and if the testbench
> >doesn't catch it that's really bad.
> I am using the same source tree for both so the only differences is
> the register usage.
> 
> The unpatched tip, which is going to use c code for planar32,
> produces the same intra pred decision percentages as not using
> registers for constants but different encoded output.
> 
> x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%)
> 
> encoded 2000 frames in 101.82s (19.64 fps), 1008.64 kb/s
> 
> ls -l bridge-close.*
> -rw-r--r-- 1 shakezula shakezula   8432239 Mar 10 10:03 bridge-close.hevc
> 
> The reconstructed output of all three looks the same.
> 
> Just to test for overflow I modified the testbench to test with all
> maximum 10-bit values of 0x3FF instead of random values and it
> passes.  One more bit, 0x4FF, and it fails.  Though the y4m file has
> 8 bit depth.

this sounds like your outputs would be non-deterministic if you just ran
the same encode multiple times? That would be a different class of bug,
perhaps unrelated to your work on the intra primitives.

I don't think we often check for non-determinism on older architectures.
we regularly test --no-asm against fully optimized outputs but this
only tests primitives normally used on our test machines.

-- 
Steve Borho


More information about the x265-devel mailing list