Common AI Applications in Different Deep Learning Algorithms
Artificial intelligence (AI) has been in an explosive period in recent years, and more and more unexpected applications have been proposed, indicating that there is still much room for development of AI. The same data type can be analyzed by different methods to solve different problems. For example, the face recognition technology, which is already quite mature so far, can be used for security in national defense applications; it can be used in an employee access control system for enterprises; it can be used to analyze market surveys in stores with gender and age identification, or to analyze crowd flow in conjunction with tracking technology (For object detection and model deployment, please refer to the articles "About Object Detection" and "How To Accelerate AI Model Inference On GPU：A Hands-On Introduction To TensorRT" in the Leadtek AI forum).
Related posts recap:
The next part of this article is to introduce common applications of AI based on the data types or algorithms of deep learning methods.
Differentiate Deep Learning Applications with Algorithms
There are three major categories of algorithms:
- Convolutional neural networks (CNN) commonly used for image data analysis
- Recurrent neural networks (RNN) for text analysis or natural language processing
- Generative Adversarial Networks (GAN) often used for data generation or unsupervised learning applications.
Among them, CNN is the most widely used, because the algorithm is easy to understand and easy to learn, and the related tools, open source modules or kits have greatly reduced the development threshold. A lot of software that claims users can do AI without coding ability is mostly based on CNN applications. The following describes the applications of various algorithms in order.
Because of the variety of applications, this article will be subdivided into algorithm categories. The main applications of CNN include image classification, object detection, and semantic segmentation. The figure below shows the applications of three different methods at a glance.
(1) Image Classification
As the name suggests, it is to classify the image and identify which category the picture belongs to through deep learning. The focus is that an image contains only one category, even if the content of the image may have multiple targets, so the application of image classification alone is not very common. However, because the identification of a single target is the most accurate for deep learning algorithms, in practice many applications will first find the target through object detection methods, and then narrow the range of captured images for image classification. So as long as the object detection is applicable, image classification methods are usually used.
Image classification is also one of the many methods used to benchmark the algorithm. The public image data provided in the Large-scale Visual Recognition Challenge (ILSVRC) organized by ImageNet is often used to benchmark the algorithm. Image classification is the basis of CNN, and its related algorithms are also the easiest to understand. Therefore, beginners should first use image classification as the starting point for deep learning analysis. When using image classification for recognition, the input is usually an image and the output is a text category.
(2) Object Detection
There can be one or more objects in an image, and the objects can also belong to different categories. The algorithm can achieve two main purposes: to find the coordinates of the target and to identify the category of the target. Simply put, in addition to knowing what the target is, you also need to know where it is.
Object detection applications are very common, including the combination of face recognition related technologies mentioned at the beginning of the article, or defect detection in the manufacturing industry, and even hospitals use X-rays and ultrasound to inspect specific body parts. The basis of object detection can be imagined as adding the function of marking the position on the image classification, so the learning is not separated from the foundation of image classification. However, the coordinates marked by object detection are usually rectangular or square. Just knowing the location of the target cannot draw the edge of the target. Therefore, the commonly used applications usually set "knowing the location of the target" as the goal.
The most common algorithms are YOLO and R-CNN. Among them, YOLO has a faster recognition speed due to the characteristics of the algorithm, the latest version of which is v3. R-CNN's algorithm for searching and identifying the target position is slightly different from YOLO. Although the speed is a bit slower than YOLO, the accuracy rate is slightly higher. For a detailed introduction of the two methods, please refer to the "Computer Vision" section in the Leadtek AI forum. When using object detection for recognition, the input is usually an image and the output is one or more text categories and one or more sets of coordinates.
(3) Semantic Segmentation
The algorithm will identify each pixel in an image, which means that unlike object detection, semantic segmentation can correctly distinguish the boundary pixels of each target. In simple terms, semantic segmentation is pixel-level image classification to classify each pixel. Of course, the model of such application requires a powerful GPU and more time to train. Common applications are similar to object detection, but will be used in applications that have a high degree of precision for image recognition, such as applications that need to draw the boundaries of a target.
For example, in the manufacturing industry, flaw detection can accurately draws irregularly shaped defects. In the medical field, this is often used to distinguish diseased cells on pathological sections or to draw the areas and types of lesions by MRI, X-rays or ultrasound. Algorithms such as U-Net or Mask R-CNN are common implementation methods. When using semantic segmentation for recognition, the input is usually an image, and the output is an equal-sized image, but different types of pixels in the image are depicted with different hues.
Unlike CNN, the feature of RNN is that it can to process images or numerical data, and because the network itself has the ability to remember, it can learn the types of data that have contextual relevance. For example, when performing language translation or text translation, the first and last words in a sentence usually have a certain relationship, but the CNN network cannot learn this kind of relationship, but the RNN has better performance because it has memory. So you can understand the text through RNN. Another application is inputting an image and outputting a sentence narrating the image (as shown below).
Although RNN solves the problem that CNN cannot handle, it still has some shortcomings. Therefore, there are many deformed networks of RNN, one of the most commonly used of which is LSTM (Long Short-Term Memory Network). The input data of such networks is not limited to images or text, and the problems to be solved are not limited to translation or text understanding. Numerical related data can also be analyzed using LSTM. For example, the predictive maintenance application of the factory machine can analyze the machine vibration signal through the LSTM to predict whether the machine is about to fail. In the medical area, LSTM can help to interpret thousands of documents and find out relevant information about specific cancers, such as tumor location, tumor size, number of stages, and even treatment guidelines or survival rates, etc., to analyze through textual understanding. It can also provide lesion keywords in conjunction with image recognition to help doctors write pathology reports.
In addition to deep learning, there is an emerging network called Reinforcement Learning. One of the very distinctive networks is the generative adversarial network (GAN). The number of GAN application related papers has grown considerably (as figure below).
This article does not detail the theory or implementation of GAN, but explores the field of practical application of GAN. Data is the most essential in the field of deep learning, but usually not all applications can collect a large amount of data, and the data also needs to be manually labeled, which is very time consuming and labor cost. Image data can be increased by rotating, cropping, or changing brightness. But what if the data is not enough? At present, there are quite a few fields that use the GAN method to generate data very close to the original. For example, 3D-GAN can generate high-quality 3D objects. Of course, there are more interesting applications such as face replacement or expression replacement (as shown below).
In addition, SRGAN (Super Resolution GAN) can be used to increase the resolution of the original image; input it into the GAN model as a low-resolution image and generate a higher-quality image (as shown below). Such technology can be integrated into professional drawing software to help designers complete their design work more efficiently.
NVIDIA also provides some GAN-based platform applications on the official website, such as allowing you to draw simple lines to complete beautiful paintings through the GauGAN network, and you can also modify the style of the scene at will (as shown below). You can also go to the Official Website of NVIDIA to experience it.
If you are interested in GAN related applications, please click here for more information.