Models.MoE

A lightweight landing page that keeps track of latest open-source MoE LLMs.

Qwen3-VL (Qwen)

Vision-language MoE focused on image/video understanding, temporal reasoning, and GUI/agent tasks. Available in Instruct and reasoning-enhanced Thinking editions.

Native 256K context; commonly extended to ~1M per docs. Primary tiers: 235B and 30B.

DeepSeek-V3.2 (DeepSeek-AI)

Experimental branch introducing DeepSeek Sparse Attention (DSA) to validate long-context training and inference efficiency.

Aligned with V3.1-Terminus training setup and similar public benchmark performance; native 128K context.

DeepSeek-V3.1 (DeepSeek-AI)

Open-source MoE with dual "thinking/non-thinking" prompt templates; enhanced long-context, tool use, and agent capabilities.

671B total / 37B active; native 128K context; improved Terminus variant for more stable language consistency and agent metrics.

Qwen3-Next (Qwen)

Next-gen architecture: Hybrid Attention (Gated DeltaNet + Gated Attention) with high-sparsity MoE for ultra-long context and high throughput.

80B total / 3B active; native 256K, extendable to ~1.01M; significantly higher inference throughput.

Qwen3 (Qwen)

Latest-generation text LLM family spanning Dense and MoE; offers both Instruct and Thinking variants with strong agent capabilities and multilingual performance.

Native 256K context (some weights extend to 1M); multiple sizes for local and cloud deployment.

gpt-oss (OpenAI)

Open-weight MoE family from OpenAI with native MXFP4 quantization on MoE layers; designed for local and cloud use.

Two sizes: ~120B (117B total / ~5.1B active) fits a single 80GB GPU; ~20B (21B total / ~3.6B active) runs on ~16GB VRAM.

DeepSeek-R1 (DeepSeek-AI)

Open reasoning-focused (RL-enhanced) MoE series; aimed at complex math and code reasoning tasks.

671B total / 37B active; native 128K context; provides a thinking mode (<think>…</think>).