[x264-devel] [OpenCL] Please whitelist mesa's OpenCL implementation

Tom Stellard tom at stellard.net
Thu Feb 13 17:19:43 CET 2014


On Thu, Feb 13, 2014 at 05:13:52PM +0100, Christian König wrote:
> Hi Niccolò,
> 
> I would like to help, but this is probably more a question for Tom 
> Stellard. CCing him on this mail as well.
> 
> Cheers,
> Christian.
> 
> Am 13.02.2014 16:09, schrieb Niccolò Belli:
> > Hi,
> > I'm using Gentoo Linux amd64 with latest graphic stack from git (mesa 
> > git, llvm git, linux 3.14, etc.) and x264 git. My graphic card is an 
> > AMD Radeon HD 7950.
> >
> > Test file (~90s/~200MB): 
> > https://mega.co.nz/#!eQhSjJQR!EEe8-taN5IspIu-RW0WQzmvKzc5fkCn282kS5ugZ_as
> >
> > ffmpeg -i PlanetEarthBirds.mkv -c:v libx264 -preset slower -crf 18 
> > -x264opts opencl -c:a copy slower-cfr18-ocl.mkv
> >
> > This is what I get when encoding with --opencl with Catalyst 14.1_beta:
> >
> > [libx264 @ 0x257e360] using cpu capabilities: MMX2 SSE2Fast SSSE3 
> > SSE4.2 AVX
> > [libx264 @ 0x257e360] OpenCL acceleration enabled with Advanced Micro 
> > Devices, Inc. Tahiti (SI)
> >
> > Unfortunately when I use mesa's OpenCL implementation I get:
> >
> > [libx264 @ 0x1409360] using cpu capabilities: MMX2 SSE2Fast SSSE3 
> > SSE4.2 AVX
> > [libx264 @ 0x1409360] OpenCL: Unable to find a compatible device
> >
> >
> > This is my clinfo with radeonsi (mesa):
> >
> > ~ $ clinfo
> >
> > clinfo: /usr/lib64/libOpenCL.so.1: no version information available 
> > (required by clinfo)
> >
> >
> >
> > clinfo: /usr/lib64/libOpenCL.so.1: no version information available 
> > (required by clinfo)
> >
> > Number of platforms:                 1
> >   Platform Profile:                 FULL_PROFILE
> >   Platform Version:                 OpenCL 1.1 MESA 10.2.0-devel
> >   Platform Name:                 Default
> >   Platform Vendor:                 Mesa
> >   Platform Extensions:                 cl_khr_icd
> >
> >
> >   Platform Name:                 Default
> > Number of devices:                 1
> >   Device Type:                     CL_DEVICE_TYPE_GPU
> >   Device ID:                     4098
> >   Max compute units:                 1
> >   Max work items dimensions:             3
> >     Max work items[0]:                 256
> >     Max work items[1]:                 256
> >     Max work items[2]:                 256
> >   Max work group size:                 256
> >   Preferred vector width char:             16
> >   Preferred vector width short:             8
> >   Preferred vector width int:             4
> >   Preferred vector width long:             2
> >   Preferred vector width float:             4
> >   Preferred vector width double:         2
> >   Native vector width char:             16
> >   Native vector width short:             8
> >   Native vector width int:             4
> >   Native vector width long:             2
> >   Native vector width float:             4
> >   Native vector width double:             2
> >   Max clock frequency:                 0Mhz
> >   Address bits:                     32
> >   Max memory allocation:             500000000
> >   Image support:                 Yes
> >   Max number of images read arguments:         32
> >   Max number of images write arguments:         32
> >   Max image 2D width:                 32768
> >   Max image 2D height:                 32768
> >   Max image 3D width:                 32768
> >   Max image 3D height:                 32768
> >   Max image 3D depth:                 32768
> >   Max samplers within kernel:             0
> >   Max size of kernel argument:             1024
> >   Alignment (bits) of base address:         128
> >   Minimum alignment (bytes) for any datatype:     128
> >   Single precision floating point capability
> >     Denorms:                     Yes
> >     Quiet NaNs:                     Yes
> >     Round to nearest even:             Yes
> >     Round to zero:                 No
> >     Round to +ve and infinity:             No
> >     IEEE754-2008 fused multiply-add:         No
> >   Cache type:                     None
> >   Cache line size:                 0
> >   Cache size:                     0
> >   Global memory size:                 2000000000
> >   Constant buffer size:                 0
> >   Max number of constant args:             0
> >   Local memory type:                 Scratchpad
> >   Local memory size:                 32768
> >   Kernel Preferred work group size multiple:     1
> >   Error correction support:             0
> >   Unified memory for Host and Device:         1
> >   Profiling timer resolution:             0
> >   Device endianess:                 Little
> >   Available:                     Yes
> >   Compiler available:                 Yes
> >   Execution capabilities:
> >     Execute OpenCL kernels:             Yes
> >     Execute native function:             No
> >   Queue properties:
> >     Out-of-Order:                 No
> >     Profiling :                     Yes
> >   Platform ID:                     0x00007f04160dcfc0
> >   Name:                         AMD TAHITI
> >   Vendor:                     X.Org
> >   Device OpenCL C version:             OpenCL C 1.1
> >   Driver version:                 10.2.0-devel
> >   Profile:                     FULL_PROFILE
> >   Version:                     OpenCL 1.1 MESA 10.2.0-devel
> >   Extensions:
> >
> >
> >
> > And this is my clinfo with Catalyst 14.1_beta:
> >
> > ~ $ clinfo | head -83
> >
> > Number of platforms:                 1
> >   Platform Profile:                 FULL_PROFILE
> >   Platform Version:                 OpenCL 1.2 AMD-APP (1411.4)
> >   Platform Name:                 AMD Accelerated Parallel Processing
> >   Platform Vendor:                 Advanced Micro Devices, Inc.
> >   Platform Extensions:                 cl_khr_icd 
> > cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa
> >
> >
> >   Platform Name:                 AMD Accelerated Parallel Processing
> > Number of devices:                 2
> >   Device Type:                     CL_DEVICE_TYPE_GPU
> >   Device ID:                     4098
> >   Board name:                     AMD Radeon HD 7900 Series
> >   Device Topology:                 PCI[ B#3, D#0, F#0 ]
> >   Max compute units:                 28
> >   Max work items dimensions:             3
> >     Max work items[0]:                 256
> >     Max work items[1]:                 256
> >     Max work items[2]:                 256
> >   Max work group size:                 256
> >   Preferred vector width char:             4
> >   Preferred vector width short:             2
> >   Preferred vector width int:             1
> >   Preferred vector width long:             1
> >   Preferred vector width float:             1
> >   Preferred vector width double:         1
> >   Native vector width char:             4
> >   Native vector width short:             2
> >   Native vector width int:             1
> >   Native vector width long:             1
> >   Native vector width float:             1
> >   Native vector width double:             1
> >   Max clock frequency:                 810Mhz
> >   Address bits:                     32
> >   Max memory allocation:             1073741824
> >   Image support:                 Yes
> >   Max number of images read arguments:         128
> >   Max number of images write arguments:         8
> >   Max image 2D width:                 16384
> >   Max image 2D height:                 16384
> >   Max image 3D width:                 2048
> >   Max image 3D height:                 2048
> >   Max image 3D depth:                 2048
> >   Max samplers within kernel:             16
> >   Max size of kernel argument:             1024
> >   Alignment (bits) of base address:         2048
> >   Minimum alignment (bytes) for any datatype:     128
> >   Single precision floating point capability
> >     Denorms:                     No
> >     Quiet NaNs:                     Yes
> >     Round to nearest even:             Yes
> >     Round to zero:                 Yes
> >     Round to +ve and infinity:             Yes
> >     IEEE754-2008 fused multiply-add:         Yes
> >   Cache type:                     Read/Write
> >   Cache line size:                 64
> >   Cache size:                     16384
> >   Global memory size:                 2923429888
> >   Constant buffer size:                 65536
> >   Max number of constant args:             8
> >   Local memory type:                 Scratchpad
> >   Local memory size:                 32768
> >   Kernel Preferred work group size multiple:     64
> >   Error correction support:             0
> >   Unified memory for Host and Device:         0
> >   Profiling timer resolution:             1
> >   Device endianess:                 Little
> >   Available:                     Yes
> >   Compiler available:                 Yes
> >   Execution capabilities:
> >     Execute OpenCL kernels:             Yes
> >     Execute native function:             No
> >   Queue properties:
> >     Out-of-Order:                 No
> >     Profiling :                     Yes
> >   Platform ID:                     0x00007f59736ee500
> >   Name:                         Tahiti
> >   Vendor:                     Advanced Micro Devices, Inc.
> >   Device OpenCL C version:             OpenCL C 1.2
> >   Driver version:                 1411.4 (VM)
> >   Profile:                     FULL_PROFILE
> >   Version:                     OpenCL 1.2 AMD-APP (1411.4)
> >   Extensions:                     cl_khr_fp64 cl_amd_fp64 
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics 
> > cl_khr_int64_base_atomics cl_khr_int64_extended_atomics 
> > cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing
> > cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 
> > cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt 
> > cl_khr_image2d_from_buffer cl_khr_spir
> >
> >
> > Is it possible to whitelist mesa's OpenCL implementation? Can someone 
> > please write a patch to force-enable OpenCL so that I can at least 
> > test if it works with mesa?
> >

Does x264 currently blacklist Mesa? It's possible it's failing for
some other reason, like a missing extension or there is an error
compiling one of the kernels.

-Tom


More information about the x264-devel mailing list