A Python library for large-scale deep learning training across thousands of GPUs for LLMs and other massive models