Single-machine Model Parallel Best Practices
AI News Sharing
Model parallel is widely-used in distributed training techniques.
This post shows three different ways of GPU implementation: Running a training model 1) on a single GPU, 2) using model parallel, and 3) by pipelining model parallel.
Model parallel practice splits a single model onto different GPUs, rather than replicating the entire model on each GPU. (Shown as below)
Model Parallel Architecture
Model Parallel Execution Process
Run the experiment with ResNet50 and we can get the result shown as below.
The experiment result shows that, the execution time of model parallel implementation is roughly 7% longer than the existing single-GPU implementation.
Pipelining inputs to model parallel ResNet50 speeds up the training process by roughly 49%. It is still quite far away from the ideal 100% speedup.
Read Full Article Here