Universal Language for GPU Computing: OpenCL vs CUDA

opencl n.w
1 / 7
Embed
Share

"Explore the realm of parallel programming with OpenCL and CUDA, comparing their pros and cons. Understand the challenges and strategies for converting CUDA to OpenCL, along with insights into modifying GPU kernel code for optimal performance."

  • OpenCL
  • CUDA
  • GPU Computing
  • Parallel Programming
  • Kernel Code

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. OpenCL Usman Roshan Department of Computer Science NJIT

  2. OpenCL Universal language for parallel programming https://www.khronos.org/opencl/ Increasing usage in GPU computing Pros: your GPU program will run not just on NVIDIA but other GPUs as well (such as AMD) Cons: not as easy to program in as CUDA

  3. SimpleOpenCL Open source API for writing OpenCL programs Main challenge in OpenCL programs is the setup SimpleOpenCL provides simple functions for setting up the GPU https://code.google.com/p/simple-opencl/

  4. Strategy to convert Chi2 in CUDA to OpenCL Define blocks and threads With arrays global_work_size[2] and local_work_size[2] global_work_size[0] = BLOCKS * THREADS; global_work_size[1] = 1; local_work_size[0] = THREADS; local_work_size[1] = 1; Initialize hardware hardware = sclGetAllHardware(&found); sclPrintHardwareStatus(*hardware); Initialize software software = sclGetCLSoftware(OPENCL_KERNEL_FILE, name_of_kernel_function", hardware[0]);

  5. CUDA to OpenCL Device arrays defined with cl_mem Replace cudamalloc with dev_results_clmem = sclMalloc( hardware[0], CL_MEM_READ_WRITE, size * sizeof(float) ); To write to GPU memory replace cudamemcpy with sclWrite( hardware[0], size * sizeof(unsigned char), dev_dataT_clmem, (void*) dataT ); To read from GPU memory replace cudamemcpy with sclRead( hardware[0], cols * sizeof(float), results_clmem, host_results );

  6. CUDA to OpenCL Replace kernel call by first setting kernel parameters sclSetKernelArg( software, 0, sizeof(uint), &var) sclSetKernelArg( software, 1, sizeof(cl_mem), (void*) &dev_var_clmem) sclSetKernelArg( software, 2, sizeof(cl_mem), (void*) &dev_const_var_clmem) Then call the kernel with sclLaunchKernel( hardware[0], software, global_work_size, local_work_size );

  7. Modifications to GPU kernel code Use __kernel to define kernel function Use __global and __local for global and local variables Use __constant for constant memory definitions Get thread id with get_global_id(0);

More Related Content