spdup.net

Tech news

Deepseek V3.2 Speciale and Mistral Large 3 Tested – Open‑Source Models Return to the Spotlight


Deepseek V3.2 Speciale and Mistral Large 3 Tested – Open‑Source Models Return to the Spotlight

Introduction

The open‑source large language model (LLM) ecosystem has witnessed a resurgence with the release of two high‑profile models: Deepseek V3.2 Speciale and Mistral Large 3. Both projects stem from veteran open‑source developers—Deepseek, known for its V3 and R1 series, and Mistral, one of the first Western companies to ship competitive, permissively‑licensed models. This article examines the architectural innovations, benchmark performance, and practical implications of these new releases.

Background: The Evolution of Open‑Source LLMs

  • Deepseek gained attention with the V3 architecture, delivering strong performance on a range of tasks while remaining accessible to the community.
  • Mistral made a notable impact with the Mistral‑Nemo 32‑billion‑parameter model, praised for its local‑run efficiency. However, later releases suffered from restrictive licensing and a lack of transparency, diminishing their appeal.

Both companies have now returned with updated models that promise state‑of‑the‑art (SOTA) results while retaining open licensing.

Deepseek V3.2 Speciale – Architecture and Sparse Attention

Core Design

Deepseek’s V3.2 builds on the original V3 architecture but introduces DeepSeek Sparse Attention (DSA), a novel attention mechanism that mitigates the quadratic cost of traditional transformer attention. DSA employs a “lightning indexer” to rank tokens by relevance and attend only to the top‑k most important ones, effectively reducing computational complexity while preserving dense‑model quality.

Context Length and Efficiency

  • Maximum context: 128,000 tokens
  • Compute reduction: Significant, enabling affordable inference even on modest hardware or cloud instances.

The “Speciale” Variant

Deepseek released two checkpoints:

  1. General V3.2 – the standard, non‑reasoning model.
  2. Speciale – a dedicated reasoning model that relaxes length penalties during training, allowing the model to generate longer, more coherent reasoning chains without requiring inference‑time tweaks.

Both checkpoints are publicly available on Hugging Face and have been integrated into routing services such as OpenRouter and Kylo Code.

Mistral Large 3 – Features and Benchmarks

Model Portfolio

Mistral’s latest suite includes:

  • Mistral Large 3 – a 45‑billion‑parameter mixture‑of‑experts (MoE) model that activates roughly 41 billion parameters per token.
  • Smaller variants: Mistral 31‑4B, 8B, and 3B.

The MoE approach mirrors Deepseek’s architecture, offering a balance between parameter count and inference speed.

Reasoning Capability

Mistral Large 3 is marketed as a non‑reasoning model; it excels at code generation and tool‑calling but does not specialize in chain‑of‑thought reasoning. This distinction is important when selecting a model for specific downstream tasks.

Comparative Benchmark Results

The author evaluated both models on a custom suite covering geometry generation, SVG creation, 3D rendering, game‑style art, and programming tasks. Below is a summary of the observed performance:

Deepseek V3.2 (General) – Key Findings

  • Floor‑plan generation: Produced incoherent text, failing to deliver a 3‑D layout.
  • SVG panda: Better than Mistral but still lagging behind top‑tier models.
  • Pokéball in Three.js: Mostly correct; minor missing UI element (button).
  • Chessboard with autoplay: Accurate rendering and logical move sequence.
  • Kandinsky‑style Minecraft clone: Unusable output.
  • Majestic butterfly illustration: Low visual fidelity, reminiscent of early‑2000s graphics.
  • Rust CLI tool code: Non‑functional.
  • Blender script: Failed to execute.
  • Math riddles: Mixed; simple riddles solved, arithmetic problems often incorrect.

Mistral Large 3 – Key Findings

  • Floor‑plan (3‑D): Poorly generated, not meeting spatial requirements.
  • SVG panda: Inconsistent body proportions.
  • Pokéball in Three.js: Objects misplaced, dimensions inaccurate.
  • Chessboard autoplay: Non‑functional.
  • Minecraft clone: Lacked coherence.
  • Butterfly illustration: Acceptable but not impressive.
  • Rust CLI tool: Non‑working code.
  • Blender script: Failed to produce expected results.
  • Math problems: Generally unsolved.

Leaderboard Placement

  • Deepseek V3.2 (General): Ranked 11th on the public LLM leaderboard, surpassing models such as GPT‑5.1 CEX and GLM.
  • Deepseek Speciale (Reasoning): Positioned lower due to instability in API responses and buggy code generation.
  • Mistral Large 3: Holds 27th place, respectable but trailing behind leading open‑source contenders.

The results suggest that while both models are competitive, they still trail the most polished open‑source alternatives like GLM, MiniMax, and Kimmy.

Availability and Integration

  • Model weights: Hosted on Hugging Face for both the general and Speciale checkpoints.
  • Routing services: Integrated with OpenRouter and Kylo Code, facilitating easy API access.
  • Tool‑calling: Both models demonstrate solid performance in tool‑calling scenarios, making them suitable for workflow automation.

Developers interested in experimenting with these models can pull the weights directly from Hugging Face and deploy them using any standard transformer library (e.g., 🤗 Transformers, vLLM).

Conclusion

The release of Deepseek V3.2 Speciale and Mistral Large 3 marks a noteworthy comeback for veteran open‑source LLM developers. Deepseek’s sparse attention architecture delivers impressive efficiency at very long context windows, while the Speciale checkpoint attempts to push reasoning capabilities forward. Mistral’s MoE‑based Large 3 offers strong code‑generation performance but falls short on reasoning tasks.

Benchmark comparisons reveal that both models are competitive but not yet dominant in the open‑source landscape. They occupy respectable positions on public leaderboards and provide valuable alternatives for developers seeking permissively‑licensed models with decent tool‑calling abilities.

As the open‑source community continues to iterate, these releases underscore the importance of architectural innovation (sparse attention, mixture‑of‑experts) and transparent licensing in shaping the next generation of accessible AI models.

Watch Original Video