Course Outline

Introduction

  • What is OpenCL?
  • OpenCL vs CUDA vs SYCL
  • Overview of OpenCL features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new OpenCL project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf

OpenCL API

  • Understanding the role of OpenCL API in the host program
  • Using OpenCL API to query device information and capabilities
  • Using OpenCL API to create contexts, command queues, buffers, kernels, and events
  • Using OpenCL API to enqueue commands, such as read, write, copy, map, unmap, execute, and wait
  • Using OpenCL API to handle errors and exceptions

OpenCL C

  • Understanding the role of OpenCL C in the device program
  • Using OpenCL C to write kernels that execute on the device and manipulate data
  • Using OpenCL C data types, qualifiers, operators, and expressions
  • Using OpenCL C built-in functions, such as math, geometric, relational, etc.
  • Using OpenCL C extensions and libraries, such as atomic, image, cl_khr_fp16, etc.

OpenCL Memory Model

  • Understanding the difference between host and device memory models
  • Using OpenCL memory spaces, such as global, local, constant, and private
  • Using OpenCL memory objects, such as buffers, images, and pipes
  • Using OpenCL memory access modes, such as read-only, write-only, read-write, etc.
  • Using OpenCL memory consistency model and synchronization mechanisms

OpenCL Execution Model

  • Understanding the difference between host and device execution models
  • Using OpenCL work-items, work-groups, and ND-ranges to define the parallelism
  • Using OpenCL work-item functions, such as get_global_id, get_local_id, get_group_id, etc.
  • Using OpenCL work-group functions, such as barrier, work_group_reduce, work_group_scan, etc.
  • Using OpenCL device functions, such as get_num_groups, get_global_size, get_local_size, etc.

Debugging

  • Understanding the common errors and bugs in OpenCL programs
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using CodeXL to debug and analyze OpenCL programs on AMD devices
  • Using Intel VTune to debug and analyze OpenCL programs on Intel devices
  • Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices

Optimization

  • Understanding the factors that affect the performance of OpenCL programs
  • Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput
  • Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
  • Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth
  • Using OpenCL profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors

Audience

  • Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism
  • Developers who wish to write portable and scalable code that can run on different platforms and devices
  • Programmers who wish to explore the low-level aspects of heterogeneous programming and optimize their code performance
 28 Hours

Number of participants



Price per participant

Testimonials (1)

Related Courses

AMD GPU Programming

28 Hours

Introduction to GPU Programming

21 Hours

GPU Programming with CUDA

28 Hours

GPU Programming - OpenCL vs CUDA vs ROCm

28 Hours

ROCm for Windows

21 Hours

Related Categories