Try Nemotron 3 Online

Test Nemotron 3 Super and Nano in one place, compare long-context reasoning and throughput, then jump to the paper, HuggingFace model cards, or GitHub resources when you are ready to deploy. This playground keeps the essentials in one clean workflow so you can evaluate quickly.

Nemotron 3 Live Playground

Switch between Nemotron 3 Nano and Super to compare model behavior in real time. Choose a model, ask a question, and test Nemotron 3 Super API-style prompts across long-context tasks.

Nemotron 3 at a glance

Scale without the usual compute tax

Nemotron 3 keeps total parameter counts large while activating a smaller subset per token, enabling efficient inference and long-context reasoning for Super and Nano.

120B / 12B

Super: 120B total parameters with 12B active

1M

Context window up to 1M tokens

30B / 3.5B

Nano: 30B total parameters with 3.5B active

Where teams use Nemotron 3

Practical evaluation paths

These workflows are the fastest ways to validate long-context reasoning, agentic behavior, and throughput before you commit to deployment.

Long-document synthesis

Summarize reports, legal briefs, and multi-chapter research with Nemotron 3 in a single prompt window.

Multi-file code analysis

Ask the models to navigate large repos, explain architecture, and propose refactors.

Agentic tool workflows

Test multi-step planning and tool calling for research, ops, or automation tasks.

Retrieval + 1M context

Compare RAG strategies against full-context prompting to see what works best.

Local deployment planning

Evaluate quantization targets, VRAM needs, and latency before moving on-prem.

Multilingual evaluations

Run the same prompts across languages to measure consistency and quality.

Research highlights

Architecture and benchmark visuals

Figures sourced from the paper and shared assets, highlighting Nemotron 3 architecture and evaluation snapshots.

Nemotron 3 Super architecture diagram highlighting hybrid Mamba Transformer layers
Architecture overview (source: Nemotron-3 paper PDF).
Nemotron 3 Super benchmark figure summarizing accuracy and throughput tradeoffs
Benchmark snapshot (source: Nemotron-3 paper PDF).

Nemotron 3 paper figures

Paper visuals

Key diagrams and benchmark curves extracted from the paper.

Architecture and routing

MoE routing diagrams used in Nemotron 3 Nano.

The paper introduces Nemotron 3 (Nano, Super, Ultra) as a Mixture-of-Experts hybrid Mamba–Transformer family built for strong throughput and up to 1M-token context.

Most layers interleave Mamba-2 and MoE blocks with a small number of self-attention layers; larger models add LatentMoE and MTP layers for quality and faster generation.

Post-training uses multi-environment reinforcement learning to improve reasoning, multi-step tool use, and budget-controlled inference.

Nemotron 3 MoE routing diagram showing expert selection flow details
MoE routing (base)
Nemotron 3 latent MoE routing diagram for hybrid expert gating
Latent MoE routing

Accuracy-efficiency trade-off

Illustrates the accuracy-efficiency trade-off curves of Nemotron 3 Nano by varying the token budget.

Inference-time budget control lets you set a maximum token budget for the thinking trace.

When the budget is reached, appending the `</think>` token prompts the model to continue with the response based on the partial trace.

The curves below show how accuracy trades off against efficiency as the token budget changes.

Nemotron 3 AIME25 benchmark chart from the official paper snapshot
AIME25
Nemotron 3 GPQA benchmark chart from the official paper snapshot
GPQA
Nemotron 3 LiveCodeBench benchmark chart from the official paper snapshot
LiveCodeBench
Nemotron 3 MMLU Pro benchmark chart from the official paper snapshot
MMLU Pro

Official Nemotron 3 resources

NVIDIA Nemotron 3 efficient and open intelligence sources

Use these official sources for accurate paper details, HuggingFace model cards, GitHub links, and Super benchmark context.

Nemotron 3 official source logo for NVIDIA Research resources site

NVIDIA Nemotron 3 Research Lab

NVIDIA Nemotron 3 efficient and open intelligence overview, labs, and release notes.

Nemotron 3 paper source logo for arXiv technical report library

Nemotron 3 paper on arXiv

The official Nemotron 3 paper with methods, training, and evaluation details.

Nemotron 3 model card logo for Hugging Face Super download page

Nemotron 3 Super HuggingFace

Model card and downloads for Nemotron 3 Super on HuggingFace.

Nemotron 3 model card logo for Hugging Face Nano download page

NVIDIA Nemotron 3 Nano 30B A3B FP8

Model card and downloads for NVIDIA Nemotron 3 Nano 30B A3B FP8 on HuggingFace.

Nemotron 3 official source logo for NVIDIA Developer Blog resource

Nemotron 3 Super benchmark blog

NVIDIA Developer Blog launch with Nemotron 3 Super benchmark highlights and context.

Nemotron 3 official source logo for NVIDIA white paper download

Nemotron 3 white paper PDF

The Nemotron 3 paper PDF for deeper technical context and design notes.

Nemotron 3 architecture highlights

Hybrid MoE + long-context reasoning

Nemotron 3 combines a hybrid Mamba-Transformer MoE backbone with long-context support and robust post-training, targeting strong agentic performance at high throughput.

Nemotron 3 architecture highlights

Nemotron 3 combines a hybrid Mamba-Transformer MoE backbone with long-context support and robust post-training, targeting strong agentic performance at high throughput.

Uses a hybrid Mamba-Transformer mixture-of-experts design to raise throughput while keeping quality competitive for reasoning and tool-use tasks.

Nemotron 3 architecture wireframe highlighting hybrid MoE and Mamba layers

Pick the right Nemotron 3 model

Nano for efficiency, Super for agentic scale

Nemotron 3 ships as a family. Nano targets efficient local or edge use, Super targets high-end agentic reasoning, and Ultra expands the accuracy frontier. Both Nano and Super are open and long-context by design, including NVIDIA Nemotron 3 Nano 30B A3B FP8.

Pick the right Nemotron 3 model

Nemotron 3 ships as a family. Nano targets efficient local or edge use, Super targets high-end agentic reasoning, and Ultra expands the accuracy frontier. Both Nano and Super are open and long-context by design, including NVIDIA Nemotron 3 Nano 30B A3B FP8.

  • Nemotron 3 Nano: 30B total / 3.5B active parameters with up to 1M context.
  • Nemotron 3 Super: 120B total / 12B active parameters with up to 1M context.
  • Hybrid Mamba-Transformer MoE architecture across the family.
  • Open weights, technical reports, and reproducible recipes.
Pick the right model comparison

What the community keeps asking for

Signals from YouTube, Reddit, Perplexity AI, and Artificial Analysis

Across community discussions, demand clusters around local deployment, long-context memory, and agentic workflows, plus a desire for open tooling and better control over safety layers.

Local & offline deployment

Clear guidance on quantization, VRAM requirements, and running large models on a single GPU or local server.

Long-context memory

Reliable performance on very long documents, codebases, and multi-file reasoning without constant chunking.

Agentic workflows

Better templates for multi-agent orchestration, tool calling, and long-running tasks.

Throughput & cost efficiency

High tokens/sec and predictable latency for production inference and batch reasoning.

Open ecosystem tooling

Integrations with LangChain, AutoGen, and local UIs, plus open recipes to reproduce results.

Control & multilingual quality

Less over-restriction for creative tasks and stronger non-English performance across domains.

Video walkthrough

Community demo and first-look coverage

A quick video from the community to see how Nemotron-3 Super behaves in practice and what developers are testing.

Video from community discussions referenced in your YouTube notes.

Try Nemotron 3 Super and Nano with a clean workflow

Launch the playground to test Nemotron 3 Super or Nano, or the Nemotron 3 Super API, then open official resources for deployment and model details.

FAQ

Questions about this site

Community signals

What builders keep requesting

Local LLM Builders

Deployment

Need clear guidance on quantization and VRAM so we can run Super or Nano on a single GPU.

Agent Developers

Workflow

Looking for multi-agent templates and tool-calling patterns that scale to long tasks.

Research Analysts

Reasoning

We want 1M-token context that stays coherent for large docs and codebases.

Cost-sensitive Teams

Efficiency

High throughput and predictable latency matter more than flashy UI features.

Tooling Integrators

Ecosystem

Open recipes and integrations with LangChain/AutoGen are the fastest path to adoption.

Creative Users

Control

Need stronger multilingual quality and better control over alignment limits.