We upgraded to Gemini 3.1 (and why we changed our mind)

Two weeks ago I wrote a whole post about why we're not upgrading to Gemini 3.1. The numbers were clear: 2.5x slower, 2.6x more expensive, same error rate. Easy decision, right?

Well, I changed my mind. GrandpaCAD now runs on Gemini 3.1.

The thinking budget trick

Here's what I missed the first time around. My original comparison ran Gemini 3.1 on high thinking budget, because that's what I was using with Gemini 3. Apples to apples, I figured.

Turns out that was the wrong comparison. Gemini 3.1 on medium thinking budget actually outperforms Gemini 3 on high thinking budget. Let that sink in for a second: less thinking, better results.

This changes the entire equation from my previous post:

Metric	Gemini 3 (high)	Gemini 3.1 (medium)
Avg cost	$0.18	$0.21 (~17% more)
Error rate	0.29	~0.15 (halved)
Geometry quality	Parts often disconnected	Much better connectivity

Instead of 2.6x more expensive, we're looking at roughly 17% more per generation. The error rate dropped by half. And the geometry is noticeably better.

The cost concern from my previous post? Basically gone.

Better geometry, fewer broken parts

The quality improvement I keep coming back to is how well 3.1 understands how geometry should connect. This is hard to capture in a benchmark score. A model can get the overall shape right (good adherence score) but still produce parts that float in space or clip through each other. Walls that don't meet the base. Arms that hover next to shoulders. A hook that sits inside a backplate instead of protruding from it.

Here's a concrete example. I asked both models to generate a wall-mounted headphone holder.

Gemini 3.0 vs Gemini 3.1 headphone wall mount comparison

Left is Gemini 3. See how the hook doesn't properly connect to the wall plate? The geometry is technically there, but the pieces aren't joined. This kind of thing happened roughly 4 out of 5 times with Gemini 3 on models that required parts to connect at specific points.

Right is Gemini 3.1. The hook extends from the backplate as one continuous piece. Clean connection, printable without supports in that area. With 3.1, this kind of geometry problem dropped to about 1 in 5 attempts.

Why lower thinking works better

This is the part I find genuinely interesting. You'd expect that cranking up the thinking budget always produces better output. With Gemini 3, that was mostly true. But with 3.1, the medium budget seems to hit a sweet spot where the model reasons enough to get the geometry right without overthinking itself into errors.

The high thinking budget on Gemini 3 actually produced more errors than medium on 3.1. I don't have a great explanation for why, but the pattern was consistent across testing. More compute doesn't always mean better results, and 3.1 seems to use its thinking budget more efficiently.

Full benchmarks are coming

I want to be transparent: I haven't run the full eval suite on this yet. The improvements above are from hands-on testing, not the rigorous benchmark process I described in how we test the 3D modelling agent. The full numbers will follow soon.

What I can say from testing so far is that the combination of better geometry, lower error rates, and a modest cost increase makes this an easy upgrade. If the full benchmarks contradict that, I'll write about it.

Try it yourself

Gemini 3.1 is live on GrandpaCAD right now.

Create a 3D model with Gemini 3.1

We upgraded to Gemini 3.1 (and why we changed our mind)

The thinking budget trick

Better geometry, fewer broken parts

Why lower thinking works better

Full benchmarks are coming

Try it yourself

How to make money with a 3D printer (the boring math)

Stop Resizing AI Models. Set Your Printer Config Once.

Introducing Organic Mode

Why we're not upgrading to Gemini 3.1

Free 3D printing tools and calculators

Blender vs OpenSCAD vs JSCad vs JSON: Choosing the best LLM-to-CAD engine

Streamlined 3D Printing with BambuLab

Image to 3D Model: Sketch It, Scan It, Print It

Multi Color 3D Printing with AI

How It Works

3D Modeling Prompting Cheat Sheet

Comparing State of the Art LLMs for 3D Generation

How to Make Your Models Adjustable

How we test the 3D modelling agent

Migrating to Blender

Welcome to GrandpaCAD Beta! Here's What You Need to Know

Changelog

Paddle vs. Polar for AI Subscriptions and Usage Based Billing

GrandpaCAD: My Grandpa, AI, and the State of CAD

Giving Grandpa His Creativity Back: How AI Helped Him Create Again

GrandpaCAD: For Press and Media

State of 3D Modeling AI

We upgraded to Gemini 3.1 (and why we changed our mind)

The thinking budget trick

Better geometry, fewer broken parts

Why lower thinking works better

Full benchmarks are coming

Try it yourself