Distributed Deep Learning
HAL: Computer System for Scalable Deep Learning
HAL: Computer System for Scalable Deep Learning
V. Kindratenko, D. Mu, Y. Zhan, J. Maloney, S. Hashemi, B. Rabe, K. Xu, R. Campbell, J. Peng, and W. Gropp.Â
My Contributions
My Contributions
Distributed training on HAL with PyTorch and NVIDIA Apex
ImageNet benchmark experiments for performance analysis
Member of the NCSA HAL cluster admin team, now called NCSA CAII
NCSA HAL cluster tutorial series on distributed deep learning