Course Outline

Introduction

  • What is CUDA?
  • CUDA vs OpenCL vs SYCL
  • Overview of CUDA features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new CUDA project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf

CUDA API

  • Understanding the role of CUDA API in the host program
  • Using CUDA API to query device information and capabilities
  • Using CUDA API to allocate and deallocate device memory
  • Using CUDA API to copy data between host and device
  • Using CUDA API to launch kernels and synchronize threads
  • Using CUDA API to handle errors and exceptions

CUDA C/C++

  • Understanding the role of CUDA C/C++ in the device program
  • Using CUDA C/C++ to write kernels that execute on the GPU and manipulate data
  • Using CUDA C/C++ data types, qualifiers, operators, and expressions
  • Using CUDA C/C++ built-in functions, such as math, atomic, warp, etc.
  • Using CUDA C/C++ built-in variables, such as threadIdx, blockIdx, blockDim, etc.
  • Using CUDA C/C++ libraries, such as cuBLAS, cuFFT, cuRAND, etc.

CUDA Memory Model

  • Understanding the difference between host and device memory models
  • Using CUDA memory spaces, such as global, shared, constant, and local
  • Using CUDA memory objects, such as pointers, arrays, textures, and surfaces
  • Using CUDA memory access modes, such as read-only, write-only, read-write, etc.
  • Using CUDA memory consistency model and synchronization mechanisms

CUDA Execution Model

  • Understanding the difference between host and device execution models
  • Using CUDA threads, blocks, and grids to define the parallelism
  • Using CUDA thread functions, such as threadIdx, blockIdx, blockDim, etc.
  • Using CUDA block functions, such as __syncthreads, __threadfence_block, etc.
  • Using CUDA grid functions, such as gridDim, gridSync, cooperative groups, etc.

Debugging

  • Understanding the common errors and bugs in CUDA programs
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using CUDA-GDB to debug CUDA programs on Linux
  • Using CUDA-MEMCHECK to detect memory errors and leaks
  • Using NVIDIA Nsight to debug and analyze CUDA programs on Windows

Optimization

  • Understanding the factors that affect the performance of CUDA programs
  • Using CUDA coalescing techniques to improve memory throughput
  • Using CUDA caching and prefetching techniques to reduce memory latency
  • Using CUDA shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using CUDA profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors

Audience

  • Developers who wish to learn how to use CUDA to program NVIDIA GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different CUDA devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 28 Hours

Number of participants



Price per participant

Testimonials (1)

Related Courses

AMD GPU Programming

28 Hours

Introduction to GPU Programming

21 Hours

GPU Programming with OpenCL

28 Hours

GPU Programming - OpenCL vs CUDA vs ROCm

28 Hours

ROCm for Windows

21 Hours

Related Categories