[x265] x265 "Custom Implementation"

Tue Apr 8 22:45:28 CEST 2014

On Mon, Mar 24, 2014 at 12:04 PM, Nicolas Morey-Chaisemartin
<nmorey at kalray.eu> wrote:
> Hi everyone,

Hello Nicolas, sorry for the late reply.  I wasn't sure at first how
to respond to this email.

> My company (Kalray) is looking into writing a HEVC encoder based on x265 on
> its many core processor (MPPA-256).
> Because of our architecture (distributed, limited memory among other
> things), a direct port of x265 is not a viable solution.

There are only two ground rules

1) You would need to sign a copy of the x265 contributor agreement
2) All changes to x265 source must be contributed back under the terms
of our dual GPLv2 / commercial license

> Our plan is to write a custom encoder core optimized for our platform and
> use it as an accelerator for x265 running on a x86 processor.
> This should look something like that
>
> /-------------------------\                 /---------------------\
> |  x86                    |                 | 1 or more MPPA-256  |
> |                         ||                     |
> |  x265 preAnalysis +     |<= PCI Link => | Kalray Encoder Core |
> |  Rate Control           ||                     |
> |                         ||                     |
> \-------------------------/                \---------------------/
>
> The idea for the encoder core is to implement a CTU encoder.
> This leaves us some flexibilty on how we want to dispatch the CTU accross
> the cores (Tiles, frame parallelism, etc.)

Yes, at first guess is you would want to move TEncCu::processCU() to
your remote cores, and perhaps distribute that work to as many cores
as possible, and allow the main CPU to manage the wave-front and frame
threading data dependencies and the higher level tasks (slice
decisions and rate control)

> From what I could gather after a quick glance at x265 code is that right
> now, x265 is using HM "as is" to do the actual encoding. Meaning except for
> a few exceptions, HM code is use directly to try out the different modes,
> estimate cost, generate bitstream, etc.

Many of the HM class structures are still in place, but most of them
have been heavily modified by this point.  The processCU() interface
is likely to be stable for the near term but we will be refactoring
many of the CU data structures to optimize for memory layout (cache
locality).

> Therefore, our idea was to use HM structures as a "stable" interface between
> x265 and our encoder core. From these structures we can extract all the
> required info (pixels, reference frames, QPs, etc.) and convert/transfer it
> to our core.
>
> What is your opinion on this approach?
> Is HM classes (at least structure wise) an interface stable enough to do
> this?

I wouldn't say the structures or any given APIs are guaranteed to be
stable, but the rough functionality of processCU is unlikely to
change, and that would be a good layer to migrate/off-load work.

Regards,

-- 
Steve Borho