[x264-devel] FPGAs and x264

Jason Garrett-Glaser darkshikari at gmail.com
Sat Jul 4 21:30:53 CEST 2009


On Sat, Jul 4, 2009 at 12:23 PM, David Smith<agentdavo at mac.com> wrote:
> HI,
>
> I am currently working on a project to offload x264 functions to an FPGA.  I
> have read several papers recently that describe the offloading of
> x264_me_search_ref.   However, that is a large function to replicate in
> hardware and well beyond the scope of my project.
>
> Fo my current project I am designing a SAD/SATD 16 core processor.  Each
> core will have 4 processing elements.  Each element can compute up to either
> a 8x8 SAD or SATD.
>
> What are your thoughts on, for example
>
>  --threads 16 with each thread directed to a specific FPGA core. Then queue
> the computation of 4 8x8 SAD/SATDs per thread to the FPGA - so that we are
> transferring and computing a meaningful amount of data whilst reducing PCIe
> bus saturation.
>
> Any input or guidance is appreciated.
>
> David.

I would imagine that the latency to the FPGA from the CPU is higher
than the cost of computing a 16x16 SATD on the CPU (170 cycles).

The reason they offload the entire me_search_ref is because in a
normal case, it's hard to offload individual small DSP functions
because their results directly affect the future actions of the
algorithm that calls them, so you can't run them independently of the
main function.

Speaking of which, can you post links to any of these papers that
offload me_search_ref?

Dark Shikari

Dark Shikari


More information about the x264-devel mailing list