Models.MoE
A lightweight landing page that keeps track of latest open-source MoE LLMs.
Qwen3-VL (Qwen)
Vision-language MoE focused on image/video understanding, temporal reasoning, and GUI/agent tasks. Available in Instruct and reasoning-enhanced Thinking editions.
Native 256K context; commonly extended to ~1M per docs. Primary tiers: 235B and 30B.-
Qwen3-VL-235B-A22B-Instruct
Large-capacity multimodal MoE — Instruct edition for general VLM tasks, tool use, and agent pipelines.
Open on Hugging Face -
Qwen3-VL-30B-A3B-Instruct
Mid-size multimodal MoE — Instruct edition for production VLM and prototyping.
Open on Hugging Face -
Qwen3-VL-235B-A22B-Thinking
Large-capacity multimodal MoE — Thinking edition for stronger step-by-step visual reasoning and long-horizon video.
Open on Hugging Face -
Qwen3-VL-30B-A3B-Thinking
Mid-size multimodal MoE — Thinking edition balancing compute with robust visual reasoning.
Open on Hugging Face
DeepSeek-V3.2 (DeepSeek-AI)
Experimental branch introducing DeepSeek Sparse Attention (DSA) to validate long-context training and inference efficiency.
Aligned with V3.1-Terminus training setup and similar public benchmark performance; native 128K context.DeepSeek-V3.1 (DeepSeek-AI)
Open-source MoE with dual "thinking/non-thinking" prompt templates; enhanced long-context, tool use, and agent capabilities.
671B total / 37B active; native 128K context; improved Terminus variant for more stable language consistency and agent metrics.-
DeepSeek-V3.1-Terminus
Iterated release reducing code-switching and odd characters; improves agent benchmarks such as BrowseComp and SWE-bench.
Open on Hugging Face -
DeepSeek-V3.1
Primary weights; supports thinking/non-thinking and tool-use templates; MIT License; noted as 671B total / 37B active / 128K context.
Open on Hugging Face
Qwen3-Next (Qwen)
Next-gen architecture: Hybrid Attention (Gated DeltaNet + Gated Attention) with high-sparsity MoE for ultra-long context and high throughput.
80B total / 3B active; native 256K, extendable to ~1.01M; significantly higher inference throughput.Qwen3 (Qwen)
Latest-generation text LLM family spanning Dense and MoE; offers both Instruct and Thinking variants with strong agent capabilities and multilingual performance.
Native 256K context (some weights extend to 1M); multiple sizes for local and cloud deployment.-
Qwen3-235B-A22B-Instruct-2507
235B MoE Instruct primary weights; long-context (256K, with community PR/discussion showing 1M support).
Open on Hugging Face -
Qwen3-30B-A3B-Instruct-2507
30B-class MoE Instruct primary weights; strong dialogue and tool-use; model card notes non-thinking mode, plus 1M-token config.
Open on Hugging Face -
Qwen3-4B-Instruct-2507
Lightweight Instruct primary weights; native ~256K context; good for consumer GPUs and edge.
Open on Hugging Face -
Qwen3-235B-A22B-Thinking-2507
235B MoE Thinking primary weights; supports <think>…</think> and 1M-token support per discussions.
Open on Hugging Face -
Qwen3-30B-A3B-Thinking-2507
30B-class Thinking variant; model card highlights improved reasoning and long-context upgrades.
Open on Hugging Face -
Qwen3-4B-Thinking-2507
4B Thinking variant; README details qualitative reasoning improvements and usage.
Open on Hugging Face
gpt-oss (OpenAI)
Open-weight MoE family from OpenAI with native MXFP4 quantization on MoE layers; designed for local and cloud use.
Two sizes: ~120B (117B total / ~5.1B active) fits a single 80GB GPU; ~20B (21B total / ~3.6B active) runs on ~16GB VRAM.DeepSeek-R1 (DeepSeek-AI)
Open reasoning-focused (RL-enhanced) MoE series; aimed at complex math and code reasoning tasks.
671B total / 37B active; native 128K context; provides a thinking mode (<think>…</think>).