[x265] Weekly summary for week ending 20/5/16
Ramya Sriraman
ramya at multicorewareinc.com
Mon May 23 06:07:45 CEST 2016
*Highlights*
*Details*
I design and implement ARM NEON algorithm on DCT16x16, since ARM registers
very limited, I design algorithm to process 16x4 everytime, and loop 4
times to process all of DCT-1D rows. the DCT-2D is similar but work on
32-bits intermedia (the 32-bits multiplication is bottleneck here, as
compare to single cycle 16-bits multiplication, it is 4-cycles)
*Plans*
Write a example for psyCost_pp<2> (psyCost_pp_4x4)
I need more ~2 weeks to finish the DCT16x16, the function too large and
complex, I need more time to debug and adjust my algorithm / code, and I
need average ~20 minutes to execute debug top (modify from our Testbench)
in the simulate environment.
Thank you
Regards
Ramya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20160523/b1f1ca8e/attachment.html>
More information about the x265-devel
mailing list