How to use GPU to accelerate computing
Since the GPU has been open for programming, its range of application has become wider and wider. In addition to the currently most popular AI applications, there are still many areas that require GPUs to reduce computing time. Whether it is solving the problem of finite element analysis in scientific operations, or DNA comparison in biological engineering; even designers use software to accelerate rendering of completed CAD models with the ray-tracing technology. Because these applications use a large amount of data for calculations, the combination of GPU hardware and related CUDA-based algorithms can greatly reduce the time to complete the calculations. Is the GPU really so magical? How exactly are these software and algorithms accelerated by the GPU?
The Areas where the CUDA Technology is Used to Accelerate Computing
The process of GPU computing
How the calculations of these applications can be completed very quickly depends on more than just the GPU, but also CUDA that drives these operations. Since the release of version 1.0 in 2007, CUDA (Compute Unified Device Architecture) version 10.2 is now available. In simple terms, it is a bridge between developers and the GPU. With CUDA technology, developers familiar with various programming languages can write programs to accelerate algorithms through the GPU. The basic principle of using GPU acceleration is actually very simple.
The host computer has memory (RAM) responsible for accessing data and a CPU responsible for operations, and the GPU also has memory for accessing data and a chip for operations (see the figure below). In a program, the main process will be controlled and allocated by the CPU. When a certain program is to be accelerated by the GPU, the data can be copied from the main memory to the GPU memory through the CUDA syntax. In the second step the CPU calls the GPU to start the calculation. After the calculation is completed, the CPU assigns the GPU to return the calculation results stored in the GPU memory to the main memory. In this way, the GPU-acceleration process is completed.
The Process of GPU Computing
Image Source
Why can the GPU accelerate computing?
The GPU and the CPU are very different in nature. The CPU has a larger cache and a faster computing core, which can handle more complex operations. But the GPU does not have such characteristics; the cache is smaller and the single-core performance is slower than the CPU. However, the GPU has a massive array of computing cores (commonly known as CUDA Core), so its advantage is that it is suitable for performing a large number of operations at the same time. A top-of-the-line CPU can have up to 56 cores (Intel Xeon Platinum 9282) and 112 threads, but a low-end GPU (such as Quadro P620) has as many as 512 cores (equal to the number of threads). Of course, this does not mean that the GPU is more powerful than the CPU, but that the types of computing suitable for the two are different. The feature that the GPU has a large number of cores has to be coupled with the parallel computing to make the most of their advantages. Therefore, as shown in the figure below, the GPU can perform calculations in parallel with all cores while the CPU must do that in sequential order. The GPU is designed for the compute-intensive, highly parallel computing required for image processing. All of the aforementioned applications have the characteristics of large amount of computing data and high parallelism, so they are the most suitable for GPU computing.
How to use GPU to accelerate computing
There are three major categories of GPU-accelerated computing:
- Using commercial software packages
- Using open source or official libraries
- CUDA programming
Using commercial software packages
The first category has many types, among which the area of finite element analysis is the largest. Related calculations in this field include applications such as fluid dynamics analysis, thermal conductivity analysis, electromagnetic field analysis, or stress analysis. Since the range covers IC design, architectural design, and even many vehicles or chemical plants need to perform simulation analysis through such software, the development of such software has great commercial value. Those who want to learn more about these applications can refer to the related articles on GPU applications.
Using open source or official libraries
The second item is more customized and flexible, because you must write your own program, and the GPU computing module calls the libraries written by others or calls the official library provided by NVIDIA. The most common official NVIDIA libraries include matrix calculation library cuBLAS, frequency analysis library cuFFT, deep learning library cuDNN, etc., and you can also search them from the community such as GitHub. You can refer to the "GPU-ACCELERATED APPLICATIONS" from NVIDIA for the above two items. The content includes software and kits that can use GPUs in various fields and provide an introduction, so that developers can easily find the appropriate application.
CUDA programming
The third item must be programmed in CUDA, but the degree of freedom varies according to the programming language. Among them, the lower-level programming language C / C ++ or Fortran have the highest degree of freedom and can control GPU computing with the programming language; they can even be optimized for local memory and GPU memory data transfers. The second is Python, which is also the most mainstream programming language for artificial intelligence applications. The implementation methods include PyCuda or using the Numba library. PyCuda users still need to write CUDA C in the kernel program. Of course, its performance is almost equivalent to writing CUDA C. Numba uses a decorator to declare in the function and uses function signature to complete the setting of calling CUDA. In addition, Java, R, C #, etc. can also support CUDA, but they are not as direct as C / C ++ or Fortran. For the installation of CUDA environment, please refer to "How To Choose GPU Driver".