Leadtek GDMS Usage Scenarios for Enterprise Users
What is the suitable platform for AI development ?
How to maximize the limited GPU resources?
Recently more and more enterprises have set up R&D team and put more resource in AI development. GPU is the essential element for AI research. With a limited budget, how to allocate hardware resources effectively to maximize the usage of your GPU system?
Leadtek GPU Docker Management System (GDMS) is a Docker-based GPU resource allocation and management system. With the intuitive graphical user interface, it provides enterprises an efficient method to integrate AI and big data project resources for centralized management. GDMS can work in tandem with the WinFast RTX AI Workstation to offer a simple setting of Docker images and resources allocation, greatly improving the convenience of deployment and AI training efficiency. In addition, due to the optimal resource utilization of GDMS, enterprises can maximize the ROI of AI hardware and software.
Leadtek GDMS includes the following features.
(1) Manage Project Resources Easily
With simple and visualized pull-down menu, system administrators can get started easily even if they are not familiar with Docker commands. Administrators can evaluate the resource requirement from R&D team and allocate GPU resources by creating, deleting, starting, or suspending Docker images or Containers in the shortest time.
(2) Real-Time GPU Monitoring
Real-time monitoring of all managed GPU systems, including CPU, memory, GPU and GPU memory usage, the current GPU system occupancy and available resource status. GDMS can assist system administrators in allocating GPU resources more flexibly and efficiently.
(3) Multiple Modes
GDMS supports ”Share Mode” and “Exclusive Mode”. You can adjust resource allocation flexibly according to your project computing requirement by sharing single GPU to multiple users or occupy multiple GPUs for a single project. The “Share Mode” can be used for AI training in the enterprise. Without the need to purchase one system for each person, the enterprise can speed up the training of AI personnel; the “Exclusive Mode” is ideal for large-scale AI project development, and it provides respective GPU resources for different teams in the enterprise. The development environments of each team are independent and will not interfere with each other. .
(4) Support Various Development Scenarios
GDMS supports users building project development environment through SSH connection or various platforms such as Jupyter Notebook. It includes a variety of pre-installed project development tools such as matplotlib to accelerate the startup of enterprise AI projects. In addition, GDMS can also open assigned ports in your customized Docker images.
(5) Scheduling
GDMS provides task scheduling management so that system administrators can set schedules for GPU systems or projects in advance to ensure smooth resource management even if they are not at work. It also supports daily, weekly, or monthly schedule for regular system maintenance.
(6) Support Connection to Storage System
When the enterprise has a storage system for access to project data, GDMS supports connection to your storage system. Data and project programs can be stored based on the established management structure within the enterprise. In this way, the enterprise can reduce data transfer time or security related concerns.
Scenario 1: Department A applies for the use of two GPUs with at least 11GB for project development.
How can the system administrator use GDMS to meet the needs of Department A?
First, the system administrator must create a project under “Exclusive Mode” (Multiple GPUs for a single user) for the newly added Containers of Department A. While creating a project, the system administrator can limit the number of containers and GPU levels under the project.
After creating the new project, you can add a container under the project. Since Department A applied for the use of two 11 GB GPUs, you can see that there are two GPUs that meet the specifications under the gs1020 system
After choosing the GPU, you must then select the AI framework. GDMS combines Leadtek WinFast GPU system, and each system is loaded with RTX AI Software Pack, administrators can choose the pre-installed AI framework directly. If you need to use jupyter notebook, you can also add a Container after selecting the function.
After the addition of the Container environment, the system administrator can provide Container connection information to the contact person of Department A. GDMS also supports report printing of multiple Containers to provide convenient information transmission. Containers in “Exclusive Mode” provide a path to access data by default, which can be viewed on the “View Container” Information page. If the enterprise has purchased storage devices such as NAS, GDMS also supports storing the developed code and data in the storage device.
The developers of Department A can launch the AI development platform through the information provided by the system administrator. This group of Containers include two 11 GB GPUs, and the AI developers can start their work immediately.
The system administrator can monitor the status of the hardware information including the GDMS server, GPU node status, Container status, and GPU node hardware information in real time from the “Overview” page. The health and usage of the current overall system and managed GPU nodes are visible at a glance with the intuitive dashboard.
GDMS home page: GDMS server status
GDMS homepage: GPU node and Container status
Scenario 2: Department B needs to conduct AI training for newcomers.
The GPU memory requirement of each newcomer is only 3GB, but they need a training environment for 6 people.
The system administrator first creates a "Share Mode" project (Single GPU for multiple users) for the Containers in training platform of Department B.
After creating a new course project, you can add a container for education and AI training. Unlike the”Exclusive Mode”, the ”Share Mode” provides the “Manual” option for setting the Container one by one and the “Automatic” option for launching the Container automatically. ”Share Mode” is very convenient when setting a multi-person training environment, which can avoid deploying the Container one by one. GDMS will automatically search for qualified hardware resources, so system administrator do not need to waste time accessing each GPU node and searching for available resources. The following picture shows how to set up 6 Containers using 3GB GPU memory each in ”Share Mode”.
Select the AI framework and start the Container. The system will search for idle resources according to the scheduling order and open the Container one by one. Department B selects NVIDIA DIGITS for image recognition courses, so that members can learn how to build AI models through the web interface.
Each member connects to their respective course environment through an IP and port to start the course (shown as below). The environment used by the members will not interfere with each other. Through GDMS allocation, the 11GB GPU will be evenly allocated to 3 students to ensure that everyone has enough hardware resources for AI practicing.
Other functions
GPU Node Group and Management
GDMS can administer the managed servers in groups, so that the system administrator can set the GPU nodes under a specific group for task scheduling, such as startup, shutdown, system restart or Docker image update. On the managed GPU nodes, the GPU usage status and the number of Containers currently used under the GPU can be viewed, which provides transparent resource management. The GPU node management page contains the GPU node name, IP, CPU and GPU usage
AI Project Management Page
In addition to adding, editing, and removing projects, the project management page can also control all projects and the Containers in the project. The system administrator can also start or pause the entire project or individual Container to avoid long-term occupation of hardware resources.
Customized Docker Image Creation
GDMS can utilize the 12 pre-loaded Docker Images in the WinFast RIX AI workstation/server (please refer to Leadtek AI Forum article "WinFast RTX AI Workstation Software User's Manual"), or your customized Docker Images. When AI developers feel the default AI framework is insufficient for their project, they can make their own Docker Image. Usually, the customized Docker Image is packed into a tar file, and GDMS can upload this file to all GPU nodes. After completion, the AI developers can directly select this Docker Image for GDMS project deployment.
Scheduling
GDMS provides task scheduling management so that system administrators can set schedules for GPU systems or projects in advance to ensure smooth resource management even if they are not at work. It also supports daily, weekly, or monthly schedule for regular system maintenance.