Oct 20, 2025

Google Gemini 3 Checkpoint Review Orion Mist and Lithium Flow Show Promising Performance

Introduction

Google has not announced an official launch date for Gemini 3, but recent activity on the LM Arena platform suggests that two new checkpoints—Orion Mist and Lithium Flow—are already available for public testing. While neither model has been formally confirmed as a Gemini 3 checkpoint, their performance and leaked details align closely with expectations for the next generation of Google’s large language models (LLMs). This article examines the characteristics of these checkpoints, outlines a systematic test suite, and compares the results against earlier Gemini checkpoints such as ECPT.

Overview of the New Checkpoints

Lithium Flow – The base model without any grounding or web‑search extensions.
Orion Mist – Identical to Lithium Flow but with the grounding/search tool enabled, allowing it to retrieve recent information.

Both models appear to be variations of the same underlying architecture; the primary distinction lies in the optional tool that provides up‑to‑date knowledge. Community feedback on Twitter indicates that these checkpoints may be slightly more constrained than the earliest Gemini releases, but still represent a solid step forward from the ECPT checkpoint.

Testing Methodology

The author evaluated the models using a fixed set of 11 questions and prompts covering visual generation, 3D scene creation, scripting, and general reasoning. Tests were conducted on LM Arena’s “battle” mode, where the model’s responses can be directly compared with previous checkpoints. The same prompt set was applied to both Orion Mist and Lithium Flow, though only Lithium Flow results are presented here because the outputs are essentially identical.

Results

1. Floor‑Plan Generation

The generated floor plan was functional but lacked the polish and spatial logic seen in earlier checkpoints. While not outright erroneous, the output was less impressive than prior versions and resembled the quality of the ECPT checkpoint.

2. SVG Panda Eating a Burger

Anatomy: Accurate and well‑proportioned.
Color Palette: Correctly applied and visually appealing.
Overall Quality: On par with the best earlier checkpoints and notably better than ECPT.

3. Pokéball Render

The Pokéball image displayed vibrant colors and satisfactory lighting. Compared to ECPT, the visual fidelity was higher, though the model did not automatically add a background scene as some earlier checkpoints did.

4. Chessboard Illustration

The chessboard rendering exhibited clean lines and realistic piece placement. Performance exceeded ECPT, confirming improved handling of structured visual content.

5. 3D Minecraft Scene

The generated Minecraft‑style world matched the quality of the 2HT checkpoint, offering solid geometry and texture detail. Lighting fell short of the X28 checkpoint, yet still represented an upgrade over ECPT.

6. Majestic Butterfly in a Garden

The butterfly illustration was comparable to ECPT outputs—well‑rendered but lacking the richer environmental detail found in the X58 checkpoint.

7. Blender Script for a Pokéball

The script correctly set up lighting and materials, producing a functional 3D model that rendered without errors. This demonstrates reliable code generation capabilities.

8. General Knowledge & Math Questions

Both categories were answered accurately, allowing the model to outscore ECPT while still trailing the top‑tier Gemini checkpoints.

Comparative Performance

Checkpoint	Visual Quality	Code Generation	Reasoning & Math	Tool‑Calling
Lithium Flow / Orion Mist	Moderate‑High (better than ECPT)	Good (Blender script works)	Strong (passes general & math)	Not evaluated (grounding enabled only in Orion Mist)
ECPT	Lower	Adequate	Adequate	—
Earlier Gemini checkpoints (e.g., X28, X58)	Highest	Excellent	Excellent	—

Overall, Lithium Flow and Orion Mist sit comfortably between the older ECPT checkpoint and the premier Gemini releases. They appear to be more finely quantized versions intended for broader deployment via LM Arena’s endpoints, likely operating with slightly reduced “thinking budgets” to balance latency and cost.

Implications for Deployment

Quantization Trade‑off: The modest performance dip suggests Google is preparing these checkpoints for real‑world use, where lower‑precision models reduce computational overhead while maintaining acceptable quality.
Tool Calling: Orion Mist’s grounding capability could prove valuable for applications requiring up‑to‑date information, though its overall impact on raw reasoning remains similar to Lithium Flow.
User Transparency: Clear labeling of which checkpoint is live would help developers set realistic expectations and benchmark their own implementations.

Conclusion

The emergence of Orion Mist and Lithium Flow on LM Arena offers a promising glimpse into the next phase of Google’s Gemini roadmap. While they do not yet match the visual and reasoning prowess of the earliest Gemini checkpoints, they represent a noticeable improvement over ECPT and demonstrate solid capabilities across image generation, 3D scripting, and logical reasoning.

If these models become the default endpoints for Google’s AI services, developers can expect a balanced blend of performance and efficiency. Continued monitoring of tool‑calling behavior and further benchmarking against upcoming releases—particularly the rumored “Flash” model—will be essential for anyone building on Google’s LLM ecosystem.