Try Nemotron 3 Online

Test Nemotron 3 Super and Nano in one place, compare long-context reasoning and throughput, then jump to the paper, HuggingFace model cards, or GitHub resources when you are ready to deploy. This playground keeps the essentials in one clean workflow so you can evaluate quickly.

Nemotron 3 Live Playground

Switch between Nemotron 3 Nano and Super to compare model behavior in real time. Choose a model, ask a question, and test Nemotron 3 Super API-style prompts across long-context tasks.

Nemotron 3 at a glance

Scale without the usual compute tax

Nemotron 3 keeps total parameter counts large while activating a smaller subset per token, enabling efficient inference and long-context reasoning for Super and Nano.

120B / 12B

Super: 120B total parameters with 12B active

Context window up to 1M tokens

30B / 3.5B

Nano: 30B total parameters with 3.5B active

Where teams use Nemotron 3

Practical evaluation paths

These workflows are the fastest ways to validate long-context reasoning, agentic behavior, and throughput before you commit to deployment.

Long-document synthesis

Summarize reports, legal briefs, and multi-chapter research with Nemotron 3 in a single prompt window.

Multi-file code analysis

Ask the models to navigate large repos, explain architecture, and propose refactors.

Agentic tool workflows

Test multi-step planning and tool calling for research, ops, or automation tasks.

Retrieval + 1M context

Compare RAG strategies against full-context prompting to see what works best.

Local deployment planning

Evaluate quantization targets, VRAM needs, and latency before moving on-prem.

Multilingual evaluations

Run the same prompts across languages to measure consistency and quality.

Research highlights

Architecture and benchmark visuals

Figures sourced from the paper and shared assets, highlighting Nemotron 3 architecture and evaluation snapshots.

Nemotron 3 Super architecture diagram highlighting hybrid Mamba Transformer layers — Architecture overview (source: Nemotron-3 paper PDF).

Nemotron 3 Super benchmark figure summarizing accuracy and throughput tradeoffs — Benchmark snapshot (source: Nemotron-3 paper PDF).

Nemotron 3 paper figures

Paper visuals

Key diagrams and benchmark curves extracted from the paper.

Architecture and routing

MoE routing diagrams used in Nemotron 3 Nano.

The paper introduces Nemotron 3 (Nano, Super, Ultra) as a Mixture-of-Experts hybrid Mamba–Transformer family built for strong throughput and up to 1M-token context.

Most layers interleave Mamba-2 and MoE blocks with a small number of self-attention layers; larger models add LatentMoE and MTP layers for quality and faster generation.

Post-training uses multi-environment reinforcement learning to improve reasoning, multi-step tool use, and budget-controlled inference.

Nemotron 3 MoE routing diagram showing expert selection flow details — MoE routing (base)

Nemotron 3 latent MoE routing diagram for hybrid expert gating — Latent MoE routing

Accuracy-efficiency trade-off

Illustrates the accuracy-efficiency trade-off curves of Nemotron 3 Nano by varying the token budget.

Inference-time budget control lets you set a maximum token budget for the thinking trace.

When the budget is reached, appending the `</think>` token prompts the model to continue with the response based on the partial trace.

The curves below show how accuracy trades off against efficiency as the token budget changes.

Nemotron 3 AIME25 benchmark chart from the official paper snapshot — AIME25

Nemotron 3 GPQA benchmark chart from the official paper snapshot — GPQA

Nemotron 3 LiveCodeBench benchmark chart from the official paper snapshot — LiveCodeBench

Nemotron 3 MMLU Pro benchmark chart from the official paper snapshot — MMLU Pro

Official Nemotron 3 resources

NVIDIA Nemotron 3 efficient and open intelligence sources

Use these official sources for accurate paper details, HuggingFace model cards, GitHub links, and Super benchmark context.

NVIDIA Nemotron 3 Research Lab

NVIDIA Nemotron 3 efficient and open intelligence overview, labs, and release notes.

Open

Nemotron 3 paper on arXiv

The official Nemotron 3 paper with methods, training, and evaluation details.

Open

Nemotron 3 Super HuggingFace

Model card and downloads for Nemotron 3 Super on HuggingFace.

Open

NVIDIA Nemotron 3 Nano 30B A3B FP8

Model card and downloads for NVIDIA Nemotron 3 Nano 30B A3B FP8 on HuggingFace.

Open

Nemotron 3 Super benchmark blog

NVIDIA Developer Blog launch with Nemotron 3 Super benchmark highlights and context.

Open

Nemotron 3 white paper PDF

The Nemotron 3 paper PDF for deeper technical context and design notes.

Open

Nemotron 3 architecture highlights

Hybrid MoE + long-context reasoning

Nemotron 3 combines a hybrid Mamba-Transformer MoE backbone with long-context support and robust post-training, targeting strong agentic performance at high throughput.

Nemotron 3 architecture highlights

Nemotron 3 combines a hybrid Mamba-Transformer MoE backbone with long-context support and robust post-training, targeting strong agentic performance at high throughput.

Uses a hybrid Mamba-Transformer mixture-of-experts design to raise throughput while keeping quality competitive for reasoning and tool-use tasks.

Nemotron 3 architecture wireframe highlighting hybrid MoE and Mamba layers

Pick the right Nemotron 3 model

Nano for efficiency, Super for agentic scale

Nemotron 3 ships as a family. Nano targets efficient local or edge use, Super targets high-end agentic reasoning, and Ultra expands the accuracy frontier. Both Nano and Super are open and long-context by design, including NVIDIA Nemotron 3 Nano 30B A3B FP8.

Pick the right Nemotron 3 model

Nemotron 3 Nano: 30B total / 3.5B active parameters with up to 1M context.
Nemotron 3 Super: 120B total / 12B active parameters with up to 1M context.
Hybrid Mamba-Transformer MoE architecture across the family.
Open weights, technical reports, and reproducible recipes.

What the community keeps asking for

Signals from YouTube, Reddit, Perplexity AI, and Artificial Analysis

Across community discussions, demand clusters around local deployment, long-context memory, and agentic workflows, plus a desire for open tooling and better control over safety layers.

Local & offline deployment

Clear guidance on quantization, VRAM requirements, and running large models on a single GPU or local server.

Long-context memory

Reliable performance on very long documents, codebases, and multi-file reasoning without constant chunking.

Agentic workflows

Better templates for multi-agent orchestration, tool calling, and long-running tasks.

Throughput & cost efficiency

High tokens/sec and predictable latency for production inference and batch reasoning.

Open ecosystem tooling

Integrations with LangChain, AutoGen, and local UIs, plus open recipes to reproduce results.

Control & multilingual quality

Less over-restriction for creative tasks and stronger non-English performance across domains.

Video walkthrough

Community demo and first-look coverage

A quick video from the community to see how Nemotron-3 Super behaves in practice and what developers are testing.

Video from community discussions referenced in your YouTube notes.

Try Nemotron 3 Super and Nano with a clean workflow

Launch the playground to test Nemotron 3 Super or Nano, or the Nemotron 3 Super API, then open official resources for deployment and model details.

Open Playground

FAQ

Questions about this site

Community signals

What builders keep requesting

Local LLM Builders

Deployment

Need clear guidance on quantization and VRAM so we can run Super or Nano on a single GPU.

Agent Developers

Workflow

Looking for multi-agent templates and tool-calling patterns that scale to long tasks.

Research Analysts

Reasoning

We want 1M-token context that stays coherent for large docs and codebases.

Cost-sensitive Teams

Efficiency

High throughput and predictable latency matter more than flashy UI features.

Tooling Integrators

Ecosystem

Open recipes and integrations with LangChain/AutoGen are the fastest path to adoption.

Creative Users

Control

Need stronger multilingual quality and better control over alignment limits.

Try Nemotron 3 Online

Nemotron 3 Live Playground

Scale without the usual compute tax

Practical evaluation paths

Long-document synthesis

Multi-file code analysis

Agentic tool workflows

Retrieval + 1M context

Local deployment planning

Multilingual evaluations

Architecture and benchmark visuals

Paper visuals

Architecture and routing

Accuracy-efficiency trade-off

NVIDIA Nemotron 3 efficient and open intelligence sources

NVIDIA Nemotron 3 Research Lab

Nemotron 3 paper on arXiv

Nemotron 3 Super HuggingFace

NVIDIA Nemotron 3 Nano 30B A3B FP8

Nemotron 3 Super benchmark blog

Nemotron 3 white paper PDF

Hybrid MoE + long-context reasoning

Nemotron 3 architecture highlights

Hybrid Mamba-Transformer MoE

1M-token context

Efficient active parameters

Open release + post-training

Nano for efficiency, Super for agentic scale

Pick the right Nemotron 3 model

Signals from YouTube, Reddit, Perplexity AI, and Artificial Analysis

Local & offline deployment

Long-context memory

Agentic workflows

Throughput & cost efficiency

Open ecosystem tooling

Control & multilingual quality

Community demo and first-look coverage

Try Nemotron 3 Super and Nano with a clean workflow

FAQ

What is Nemotron 3 Online?

What can I do here right now?

Do you store my chats?

Will you add local deployment guides or agent templates?

Do I need to log in or buy credits?

What is Nemotron 3 Super (Nemotron Super 3)?

Which related Nemotron 3 searches does this site cover?

Community signals

Local LLM Builders

Agent Developers

Research Analysts

Cost-sensitive Teams

Tooling Integrators

Creative Users