Skip to content

PaperCodex

Subscribe

Mixture-of-Experts

Megatron-LM: Train Billion-Parameter Transformer Models Efficiently on NVIDIA GPUs at Scale

Megatron-LM: Train Billion-Parameter Transformer Models Efficiently on NVIDIA GPUs at Scale 14515

If you’re building or scaling large language models (LLMs) and have access to NVIDIA GPU clusters, Megatron-LM—developed by NVIDIA—is one…

12/26/2025Distributed Deep Learning, Large Language Model Training, Mixture-of-Experts
GLM-4.5: Open-Source MoE LLM for High-Performance Agentic Reasoning and Coding

GLM-4.5: Open-Source MoE LLM for High-Performance Agentic Reasoning and Coding 3288

GLM-4.5 is an open-source, high-performance Mixture-of-Experts (MoE) large language model engineered specifically for intelligent agents that need to reason, code,…

12/19/2025Agentic Reasoning, Code Generation, Mixture-of-Experts
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex