Ongoing research training transformer models at scale.
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
A library for accelerating Transformer model training on NVIDIA GPUs.
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Mesh TensorFlow: Model Parallelism Made Easier.
veRL is a flexible and efficient RL framework for LLMs.
A Native-PyTorch Library for LLM Fine-tuning.
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Captcha: 10 - 15 = ?*
Save my name, email, and website in this browser for the next time I comment.
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.