[x265] x265 "Custom Implementation"

Wed Apr 9 12:21:23 CEST 2014

Hi Steve,

On 04/08/2014 10:45 PM, Steve Borho wrote:
>> My company (Kalray) is looking into writing a HEVC encoder based on x265 on
>> its many core processor (MPPA-256).
>> Because of our architecture (distributed, limited memory among other
>> things), a direct port of x265 is not a viable solution.
> There are only two ground rules
>
> 1) You would need to sign a copy of the x265 contributor agreement
> 2) All changes to x265 source must be contributed back under the terms
> of our dual GPLv2 / commercial license
That should not be an issue.

>> Our plan is to write a custom encoder core optimized for our platform and
>> use it as an accelerator for x265 running on a x86 processor.
>> This should look something like that
>>
>> /-------------------------\                 /---------------------\
>> |  x86                    |                 | 1 or more MPPA-256  |
>> |                         ||                     |
>> |  x265 preAnalysis +     |<= PCI Link => | Kalray Encoder Core |
>> |  Rate Control           ||                     |
>> |                         ||                     |
>> \-------------------------/                \---------------------/
>>
>> The idea for the encoder core is to implement a CTU encoder.
>> This leaves us some flexibilty on how we want to dispatch the CTU accross
>> the cores (Tiles, frame parallelism, etc.)
> Yes, at first guess is you would want to move TEncCu::processCU() to
> your remote cores, and perhaps distribute that work to as many cores
> as possible, and allow the main CPU to manage the wave-front and frame
> threading data dependencies and the higher level tasks (slice
> decisions and rate control)
The issue with this approach is that the feedback loop between our accelerator and the x86 would be too tight to get maximum performances.
Streaming all the required data back and forth, and synchronizing will become a bottleneck very fast.
That's why our first approach is centered on tiles as we can push more work at once.
However the Encoder core running on our platform is about the same function as processCU so we can move in one direction or another later.

The primary entry point for us would probably be compressFrame
>>  From what I could gather after a quick glance at x265 code is that right
>> now, x265 is using HM "as is" to do the actual encoding. Meaning except for
>> a few exceptions, HM code is use directly to try out the different modes,
>> estimate cost, generate bitstream, etc.
> Many of the HM class structures are still in place, but most of them
> have been heavily modified by this point.  The processCU() interface
> is likely to be stable for the near term but we will be refactoring
> many of the CU data structures to optimize for memory layout (cache
> locality).
Because we will only extract a few info from pre analysis, we shouldn't be too impacted by structure changes at first.

>> Therefore, our idea was to use HM structures as a "stable" interface between
>> x265 and our encoder core. From these structures we can extract all the
>> required info (pixels, reference frames, QPs, etc.) and convert/transfer it
>> to our core.
>>
>> What is your opinion on this approach?
>> Is HM classes (at least structure wise) an interface stable enough to do
>> this?
> I wouldn't say the structures or any given APIs are guaranteed to be
> stable, but the rough functionality of processCU is unlikely to
> change, and that would be a good layer to migrate/off-load work.
>
>

Sounds great !
Now we just need to have an encoder working ;)

Thanks for the feedback.
-- 
Nicolas Morey Chaisemartin
Phone : +33 6 42 46 68 87