[x264-devel] [OpenCL] Please whitelist mesa's OpenCL implementation

Christian König deathsimple at vodafone.de
Thu Feb 13 17:13:52 CET 2014


Hi Niccolò,

I would like to help, but this is probably more a question for Tom 
Stellard. CCing him on this mail as well.

Cheers,
Christian.

Am 13.02.2014 16:09, schrieb Niccolò Belli:
> Hi,
> I'm using Gentoo Linux amd64 with latest graphic stack from git (mesa 
> git, llvm git, linux 3.14, etc.) and x264 git. My graphic card is an 
> AMD Radeon HD 7950.
>
> Test file (~90s/~200MB): 
> https://mega.co.nz/#!eQhSjJQR!EEe8-taN5IspIu-RW0WQzmvKzc5fkCn282kS5ugZ_as
>
> ffmpeg -i PlanetEarthBirds.mkv -c:v libx264 -preset slower -crf 18 
> -x264opts opencl -c:a copy slower-cfr18-ocl.mkv
>
> This is what I get when encoding with --opencl with Catalyst 14.1_beta:
>
> [libx264 @ 0x257e360] using cpu capabilities: MMX2 SSE2Fast SSSE3 
> SSE4.2 AVX
> [libx264 @ 0x257e360] OpenCL acceleration enabled with Advanced Micro 
> Devices, Inc. Tahiti (SI)
>
> Unfortunately when I use mesa's OpenCL implementation I get:
>
> [libx264 @ 0x1409360] using cpu capabilities: MMX2 SSE2Fast SSSE3 
> SSE4.2 AVX
> [libx264 @ 0x1409360] OpenCL: Unable to find a compatible device
>
>
> This is my clinfo with radeonsi (mesa):
>
> ~ $ clinfo
>
> clinfo: /usr/lib64/libOpenCL.so.1: no version information available 
> (required by clinfo)
>
>
>
> clinfo: /usr/lib64/libOpenCL.so.1: no version information available 
> (required by clinfo)
>
> Number of platforms:                 1
>   Platform Profile:                 FULL_PROFILE
>   Platform Version:                 OpenCL 1.1 MESA 10.2.0-devel
>   Platform Name:                 Default
>   Platform Vendor:                 Mesa
>   Platform Extensions:                 cl_khr_icd
>
>
>   Platform Name:                 Default
> Number of devices:                 1
>   Device Type:                     CL_DEVICE_TYPE_GPU
>   Device ID:                     4098
>   Max compute units:                 1
>   Max work items dimensions:             3
>     Max work items[0]:                 256
>     Max work items[1]:                 256
>     Max work items[2]:                 256
>   Max work group size:                 256
>   Preferred vector width char:             16
>   Preferred vector width short:             8
>   Preferred vector width int:             4
>   Preferred vector width long:             2
>   Preferred vector width float:             4
>   Preferred vector width double:         2
>   Native vector width char:             16
>   Native vector width short:             8
>   Native vector width int:             4
>   Native vector width long:             2
>   Native vector width float:             4
>   Native vector width double:             2
>   Max clock frequency:                 0Mhz
>   Address bits:                     32
>   Max memory allocation:             500000000
>   Image support:                 Yes
>   Max number of images read arguments:         32
>   Max number of images write arguments:         32
>   Max image 2D width:                 32768
>   Max image 2D height:                 32768
>   Max image 3D width:                 32768
>   Max image 3D height:                 32768
>   Max image 3D depth:                 32768
>   Max samplers within kernel:             0
>   Max size of kernel argument:             1024
>   Alignment (bits) of base address:         128
>   Minimum alignment (bytes) for any datatype:     128
>   Single precision floating point capability
>     Denorms:                     Yes
>     Quiet NaNs:                     Yes
>     Round to nearest even:             Yes
>     Round to zero:                 No
>     Round to +ve and infinity:             No
>     IEEE754-2008 fused multiply-add:         No
>   Cache type:                     None
>   Cache line size:                 0
>   Cache size:                     0
>   Global memory size:                 2000000000
>   Constant buffer size:                 0
>   Max number of constant args:             0
>   Local memory type:                 Scratchpad
>   Local memory size:                 32768
>   Kernel Preferred work group size multiple:     1
>   Error correction support:             0
>   Unified memory for Host and Device:         1
>   Profiling timer resolution:             0
>   Device endianess:                 Little
>   Available:                     Yes
>   Compiler available:                 Yes
>   Execution capabilities:
>     Execute OpenCL kernels:             Yes
>     Execute native function:             No
>   Queue properties:
>     Out-of-Order:                 No
>     Profiling :                     Yes
>   Platform ID:                     0x00007f04160dcfc0
>   Name:                         AMD TAHITI
>   Vendor:                     X.Org
>   Device OpenCL C version:             OpenCL C 1.1
>   Driver version:                 10.2.0-devel
>   Profile:                     FULL_PROFILE
>   Version:                     OpenCL 1.1 MESA 10.2.0-devel
>   Extensions:
>
>
>
> And this is my clinfo with Catalyst 14.1_beta:
>
> ~ $ clinfo | head -83
>
> Number of platforms:                 1
>   Platform Profile:                 FULL_PROFILE
>   Platform Version:                 OpenCL 1.2 AMD-APP (1411.4)
>   Platform Name:                 AMD Accelerated Parallel Processing
>   Platform Vendor:                 Advanced Micro Devices, Inc.
>   Platform Extensions:                 cl_khr_icd 
> cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa
>
>
>   Platform Name:                 AMD Accelerated Parallel Processing
> Number of devices:                 2
>   Device Type:                     CL_DEVICE_TYPE_GPU
>   Device ID:                     4098
>   Board name:                     AMD Radeon HD 7900 Series
>   Device Topology:                 PCI[ B#3, D#0, F#0 ]
>   Max compute units:                 28
>   Max work items dimensions:             3
>     Max work items[0]:                 256
>     Max work items[1]:                 256
>     Max work items[2]:                 256
>   Max work group size:                 256
>   Preferred vector width char:             4
>   Preferred vector width short:             2
>   Preferred vector width int:             1
>   Preferred vector width long:             1
>   Preferred vector width float:             1
>   Preferred vector width double:         1
>   Native vector width char:             4
>   Native vector width short:             2
>   Native vector width int:             1
>   Native vector width long:             1
>   Native vector width float:             1
>   Native vector width double:             1
>   Max clock frequency:                 810Mhz
>   Address bits:                     32
>   Max memory allocation:             1073741824
>   Image support:                 Yes
>   Max number of images read arguments:         128
>   Max number of images write arguments:         8
>   Max image 2D width:                 16384
>   Max image 2D height:                 16384
>   Max image 3D width:                 2048
>   Max image 3D height:                 2048
>   Max image 3D depth:                 2048
>   Max samplers within kernel:             16
>   Max size of kernel argument:             1024
>   Alignment (bits) of base address:         2048
>   Minimum alignment (bytes) for any datatype:     128
>   Single precision floating point capability
>     Denorms:                     No
>     Quiet NaNs:                     Yes
>     Round to nearest even:             Yes
>     Round to zero:                 Yes
>     Round to +ve and infinity:             Yes
>     IEEE754-2008 fused multiply-add:         Yes
>   Cache type:                     Read/Write
>   Cache line size:                 64
>   Cache size:                     16384
>   Global memory size:                 2923429888
>   Constant buffer size:                 65536
>   Max number of constant args:             8
>   Local memory type:                 Scratchpad
>   Local memory size:                 32768
>   Kernel Preferred work group size multiple:     64
>   Error correction support:             0
>   Unified memory for Host and Device:         0
>   Profiling timer resolution:             1
>   Device endianess:                 Little
>   Available:                     Yes
>   Compiler available:                 Yes
>   Execution capabilities:
>     Execute OpenCL kernels:             Yes
>     Execute native function:             No
>   Queue properties:
>     Out-of-Order:                 No
>     Profiling :                     Yes
>   Platform ID:                     0x00007f59736ee500
>   Name:                         Tahiti
>   Vendor:                     Advanced Micro Devices, Inc.
>   Device OpenCL C version:             OpenCL C 1.2
>   Driver version:                 1411.4 (VM)
>   Profile:                     FULL_PROFILE
>   Version:                     OpenCL 1.2 AMD-APP (1411.4)
>   Extensions:                     cl_khr_fp64 cl_amd_fp64 
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 
> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics 
> cl_khr_int64_base_atomics cl_khr_int64_extended_atomics 
> cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing
> cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 
> cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt 
> cl_khr_image2d_from_buffer cl_khr_spir
>
>
> Is it possible to whitelist mesa's OpenCL implementation? Can someone 
> please write a patch to force-enable OpenCL so that I can at least 
> test if it works with mesa?
>
> Thanks,
> Niccolò



More information about the x264-devel mailing list