We upgraded to Gemini 3.1 (and why we changed our mind)

Two weeks ago I wrote a whole post about why we're not upgrading to Gemini 3.1. The numbers were clear: 2.5x slower, 2.6x more expensive, same error rate. Easy decision, right?

Well, I changed my mind. GrandpaCAD now runs on Gemini 3.1.

The thinking budget trick

Here's what I missed the first time around. My original comparison ran Gemini 3.1 on high thinking budget, because that's what I was using with Gemini 3. Apples to apples, I figured.

Turns out that was the wrong comparison. Gemini 3.1 on medium thinking budget actually outperforms Gemini 3 on high thinking budget. Let that sink in for a second: less thinking, better results.

This changes the entire equation from my previous post:

MetricGemini 3 (high)Gemini 3.1 (medium)
Avg cost$0.18$0.21 (~17% more)
Error rate0.29~0.15 (halved)
Geometry qualityParts often disconnectedMuch better connectivity

Instead of 2.6x more expensive, we're looking at roughly 17% more per generation. The error rate dropped by half. And the geometry is noticeably better.

The cost concern from my previous post? Basically gone.

Better geometry, fewer broken parts

The quality improvement I keep coming back to is how well 3.1 understands how geometry should connect. This is hard to capture in a benchmark score. A model can get the overall shape right (good adherence score) but still produce parts that float in space or clip through each other. Walls that don't meet the base. Arms that hover next to shoulders. A hook that sits inside a backplate instead of protruding from it.

Here's a concrete example. I asked both models to generate a wall-mounted headphone holder.

Gemini 3.0 vs Gemini 3.1 headphone wall mount comparison

Left is Gemini 3. See how the hook doesn't properly connect to the wall plate? The geometry is technically there, but the pieces aren't joined. This kind of thing happened roughly 4 out of 5 times with Gemini 3 on models that required parts to connect at specific points.

Right is Gemini 3.1. The hook extends from the backplate as one continuous piece. Clean connection, printable without supports in that area. With 3.1, this kind of geometry problem dropped to about 1 in 5 attempts.

Why lower thinking works better

This is the part I find genuinely interesting. You'd expect that cranking up the thinking budget always produces better output. With Gemini 3, that was mostly true. But with 3.1, the medium budget seems to hit a sweet spot where the model reasons enough to get the geometry right without overthinking itself into errors.

The high thinking budget on Gemini 3 actually produced more errors than medium on 3.1. I don't have a great explanation for why, but the pattern was consistent across testing. More compute doesn't always mean better results, and 3.1 seems to use its thinking budget more efficiently.

Full benchmarks are coming

I want to be transparent: I haven't run the full eval suite on this yet. The improvements above are from hands-on testing, not the rigorous benchmark process I described in how we test the 3D modelling agent. The full numbers will follow soon.

What I can say from testing so far is that the combination of better geometry, lower error rates, and a modest cost increase makes this an easy upgrade. If the full benchmarks contradict that, I'll write about it.

Try it yourself

Gemini 3.1 is live on GrandpaCAD right now.

Other Blogs