Why we're not upgrading to Gemini 3.1

A few months ago I switched GrandpaCAD's default model to Gemini 3. That decision was driven by data: Gemini 3 had the highest weighted score, the best prompt adherence, and the lowest cost across 84 generations. It was the obvious pick.

So when Google dropped Gemini 3.1, I did what I always do: ran my eval suite against it.

The results were... underwhelming.

The numbers

Here's a head-to-head comparison, per generation:

Metric	Gemini 3	Gemini 3.1
Avg time	1m 24s	3m 28s
Avg cost	$0.14	$0.37
Error rate	0.29	0.29

Gemini 3.1 is 2.5x slower and 2.6x more expensive. The error rate is identical down to the second decimal. Not "roughly the same." Exactly the same.

Slower doesn't mean better

Are the models from 3.1 a tiny bit better? Yeah, maybe. I'd give it that. But the quality difference is marginal, and when you factor in what you're paying for that marginal gain (both in time and money), the math falls apart.

Think about it this way: in the time it takes Gemini 3.1 to finish one generation, you could send two prompts to Gemini 3. Two shots at getting what you want, with the same error rate per attempt. If the first one isn't quite right, you iterate. Two Gemini 3 attempts will almost always get you a better result than a single Gemini 3.1 attempt, and it'll cost you less ($0.28 vs $0.37).

I'd rather give users the ability to iterate quickly than make them wait 3.5 minutes for a marginally shinier result.

The cost problem

This is the other half of the equation. GrandpaCAD already operates at a loss. Every generation costs me more than what users pay. That's a deliberate bet: I'm banking on inference costs continuing to drop, which historically they do. But that bet only works if I'm not hemorrhaging money in the meantime.

Jumping to Gemini 3.1 would nearly triple my per-generation cost. At $0.37 per generation, the business model stops making any sense at all. Even with optimistic projections about future price cuts, the gap is too wide.

I can absorb $0.14 per generation while the market catches up. $0.37 is a different conversation entirely.

When would I switch?

If Google brings the cost and latency of 3.1 in line with what 3.0 offers today (or close to it), I'll re-run the evals and reconsider. The quality improvement, however small, would be worth it at parity pricing. But right now the tradeoff isn't there.

I'm also keeping an eye on Gemini 4 and whatever OpenAI ships next. The eval harness doesn't care about brand loyalty. Whichever model produces the best 3D models at a reasonable cost and speed wins. That's the whole point of having an eval system.

Try it yourself

Gemini 3 is live on GrandpaCAD right now. If you want to see what it can do, go make something.

Create a 3D model with Gemini 3

Why we're not upgrading to Gemini 3.1

The numbers

Slower doesn't mean better

The cost problem

When would I switch?

Try it yourself

Changelog

Free 3D printing tools and calculators

Blender vs OpenSCAD vs JSCad vs JSON: Choosing the best LLM-to-CAD engine

Streamlined 3D Printing with BambuLab

Image to 3D Model: Sketch It, Scan It, Print It

Multi Color 3D Printing with AI

How It Works

Comparing State of the Art LLMs for 3D Generation

How we test the 3D modelling agent

Grandpa's Story

Beta

Prompting Cheat Sheet

How to Make Your Models Adjustable

Migrating to Blender

State of GrandpaCAD

Paddle vs Polar for payments