DeepSeek R1: The Open-Source Reasoning Breakthrough

WHAT DEEPSEEK R1 ACTUALLY IS

DeepSeek R1 is a large language model trained specifically to reason — to think through problems step by step before producing an answer. This puts it in the same category as OpenAI's o1 series: models that trade inference speed for accuracy on hard problems like mathematics, competitive programming, and scientific reasoning.

The model is built on DeepSeek V3, a 671-billion parameter mixture-of-experts architecture released in late December 2024. The key insight of MoE is that only a fraction of the parameters activate for any given token — in V3's case, around 37 billion per token. This makes the model computationally cheaper to run than its total parameter count suggests.

R1 sits on top of V3 with reinforcement learning from human feedback tuned specifically for reasoning. Rather than training on curated reasoning traces, DeepSeek used a process where the model learns which reasoning strategies lead to correct answers — a technique that produces chain-of-thought outputs that are notably more structured than what you get from base instruction-tuned models.

HOW IT COMPARES TO O1

On AIME 2024 (a high school mathematics competition), R1 scored 79.8% — comparable to o1's 79.2%. On MATH-500, a benchmark of competition mathematics problems, R1 hit 97.3% versus o1's 96.4%. On Codeforces competitive programming, R1 reached a 2029 Elo rating, putting it in the 96.3rd percentile of human competitors.

These numbers caught the industry off guard. The assumption had been that frontier reasoning models required compute budgets and proprietary data pipelines that only OpenAI and Google could access. DeepSeek demonstrated that wasn't true — or at least that the gap was much narrower than assumed.

The honest caveat is that reasoning benchmarks have known limitations. Models can be optimised to score well on specific test sets without necessarily generalising that well across the full distribution of hard problems. But even accounting for that, the R1 results were substantive enough that the major labs took them seriously.

THE OPEN SOURCE ANGLE

What made R1 genuinely disruptive wasn't just the benchmark numbers — it was that DeepSeek released the model weights under an MIT licence. Anyone can download R1, run it locally, fine-tune it, and deploy it commercially. This isn't the "open" of models like Llama, where Meta releases weights but retains restrictions. R1 is genuinely open.

DeepSeek also published a detailed technical paper describing exactly how they trained R1. The paper became required reading in AI research circles within days of release, because it laid out a replicable blueprint for training reasoning models with reinforcement learning — something the frontier labs had kept tightly guarded.

The combination of open weights and a transparent training recipe meant that within weeks, other teams were reproducing and building on R1's techniques. This accelerated the ecosystem in a way that a closed model release never could have.

THE COST REALITY

At launch, DeepSeek offered R1 API access at $0.55 per million input tokens and $2.19 per million output tokens. OpenAI's o1 was priced at $15 per million input tokens and $60 per million output tokens. That's a roughly 27x cost difference for a model with comparable benchmark performance on reasoning tasks.

For production use cases where you're running reasoning over large volumes of text — document analysis, code review pipelines, automated grading — that cost delta is the difference between a viable product and an uneconomical one. Several teams we spoke to had shelved o1-based prototypes purely on unit economics; R1's pricing made those same use cases feasible.

WHAT IT MEANS FOR YOUR STACK

If you're building something that needs genuine reasoning — multi-step problem solving, mathematical verification, complex code generation — R1 is now part of the shortlist alongside o1 and Claude's thinking models. The API economics make it viable for production workloads that were previously o1-priced out of range.

If you want to run it locally, quantised versions of R1 are available through Ollama. A 7B distilled version runs comfortably on a modern laptop GPU; the 70B version needs serious hardware but is tractable on a cloud instance. For teams that need reasoning capability without external API dependencies — regulated industries, air-gapped environments — this is a real option.

The broader signal is that open-source AI is now genuinely competitive at the frontier. That changes the calculus for any project that assumed frontier capability required a commercial API. It doesn't anymore.