Why we're not upgrading to Gemini 3.1

A few months ago I switched GrandpaCAD's default model to Gemini 3. That decision was driven by data: Gemini 3 had the highest weighted score, the best prompt adherence, and the lowest cost across 84 generations. It was the obvious pick.

So when Google dropped Gemini 3.1, I did what I always do: ran my eval suite against it.

The results were... underwhelming.

The numbers

Here's a head-to-head comparison, per generation:

MetricGemini 3Gemini 3.1
Avg time1m 24s3m 28s
Avg cost$0.14$0.37
Error rate0.290.29

Gemini 3.1 is 2.5x slower and 2.6x more expensive. The error rate is identical down to the second decimal. Not "roughly the same." Exactly the same.

Slower doesn't mean better

Are the models from 3.1 a tiny bit better? Yeah, maybe. I'd give it that. But the quality difference is marginal, and when you factor in what you're paying for that marginal gain (both in time and money), the math falls apart.

Think about it this way: in the time it takes Gemini 3.1 to finish one generation, you could send two prompts to Gemini 3. Two shots at getting what you want, with the same error rate per attempt. If the first one isn't quite right, you iterate. Two Gemini 3 attempts will almost always get you a better result than a single Gemini 3.1 attempt, and it'll cost you less ($0.28 vs $0.37).

I'd rather give users the ability to iterate quickly than make them wait 3.5 minutes for a marginally shinier result.

The cost problem

This is the other half of the equation. GrandpaCAD already operates at a loss. Every generation costs me more than what users pay. That's a deliberate bet: I'm banking on inference costs continuing to drop, which historically they do. But that bet only works if I'm not hemorrhaging money in the meantime.

Jumping to Gemini 3.1 would nearly triple my per-generation cost. At $0.37 per generation, the business model stops making any sense at all. Even with optimistic projections about future price cuts, the gap is too wide.

I can absorb $0.14 per generation while the market catches up. $0.37 is a different conversation entirely.

When would I switch?

If Google brings the cost and latency of 3.1 in line with what 3.0 offers today (or close to it), I'll re-run the evals and reconsider. The quality improvement, however small, would be worth it at parity pricing. But right now the tradeoff isn't there.

I'm also keeping an eye on Gemini 4 and whatever OpenAI ships next. The eval harness doesn't care about brand loyalty. Whichever model produces the best 3D models at a reasonable cost and speed wins. That's the whole point of having an eval system.

Try it yourself

Gemini 3 is live on GrandpaCAD right now. If you want to see what it can do, go make something.

Other Blogs