Distributed Training
9 items across the graph — tagged with Distributed Training.
From the graph · 9
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
Build, Manage and Deploy AI/ML Systems
Democratizing Reinforcement Learning for LLMs
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
A lightweight runtime health check for PyTorch training runs.
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts…
Machine learning library, Distributed training, Deep learning, Reinforcement learning, Models, TensorFlow, PyTorch
Research platform for model training, evaluation, and experimentation across architectures, benchmarks, and recipes.
