Distributed Llm Pretraining Torchtitan
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
Originally published by orchestra. Licensed under MIT. View original: https://github.com/Orchestra-Research/AI-Research-SKILLs/tree/main/01-model-architecture/torchtitan
Anthropic and OpenAI are registered trademarks of their respective owners. skill.ski is not affiliated with or endorsed by Anthropic or OpenAI.
mcp://skill.ski/free#oski-orchestra-torchtitanCopy linkAdd the free skill.ski MCP server to your .mcp.json or equivalent config file:
{
"mcpServers": {
"skill-ski-free": {
"url": "https://mcp.skill.ski/free",
"type": "http"
}
}
}Once connected, this Oski is available as a callable tool at your runtime. No paywall. No sign-in required for free-tier Oskis.