The Benchmark Nobody Was Running

Every benchmark of image-to-3D AI grades on textures. None grade on whether the result actually prints. So I built one.

A few weeks ago I uploaded a photo of a ceramic mug to four different image-to-3D models. Three minutes later I had four shiny meshes. Pretty meshes. Meshes you'd happily put in a portfolio.

I tried to slice one of them.

It failed. So did the second. So did the third. The fourth sliced – and would have torn off the print bed in the first layer because the wall thickness was 0.18mm.

That was the moment.

The benchmark everyone runs

Search "best image-to-3D AI" and you'll find dozens of comparisons. They grade on the things that photograph well: texture quality, view consistency, polygon count, style fidelity. The output is a turntable render and a star rating.

The output is never an STL someone tried to print.

I get why. For most users of these models, the output is a video-game asset, a digital twin, a marketing render. Visual fidelity is the only dimension that matters. The mesh lives behind a screen forever.

But there's another user. The one whose job is to take a sketch and turn it into a thing that exists in space. That user doesn't care about textures. That user wants to know: can I print this?

Nobody is running that benchmark.

The Concept-to-CAD bridge

The Forkable Factory thesis is that physical products should be developed the way software is. Cheap iteration. Forkable artifacts. Agents in the loop. Most of the chain is plausible already – you can quote a part on Xometry programmatically, you can store BOMs as YAML, you can ship updates with git push.

The hard cell on the map is design.

Real CAD takes a CAD operator. Hiring one is the first capability gap a small team hits when they try to make a physical thing. Most never get past it.

If image-to-3D AI can collapse that step – even partway – it changes the unit economics of physical product creation. A founder with a sketch can get to a mesh in 40 seconds. The CAD operator becomes the editor, not the author.

That's the bigger version of "can I print this?" It's "can a thirty-second photograph become the start of a real product?"

The tension

Here's what surprised me when I scored four models on six dimensions of printability across half a dozen real-product test objects: the model that wins on visual fidelity loses on printability, and vice versa.

Hunyuan3D-2 produced the most beautiful textures of any model I tested. Its meshes look like product photos.

Its meshes also have the highest overhang percentage. They are gravity-defying renders, not gravity-respecting parts.

TripoSR is the opposite. Its meshes look chunky and approximate. They also have the most-uniform wall thickness and pass the manifold check more often than the others.

TRELLIS was the closest thing to balanced. Decent fidelity, decent geometry, sometimes wins, sometimes loses. The most "production-ready" of the open-source models, on a rubric most production users have never thought to apply.

The frontier of image-to-3D quality and the frontier of printability are not the same frontier. Picking a model on one is picking blindfolded on the other.

What I built

A small public tool that runs the test for you. Drop a photo. Get an STL with a printability score. Toggle Compare mode if you want to see all three open-source models go head-to-head on the same input.

The scoring rubric is six dimensions, 100 points: manifold check, vertex count sanity, wall thickness, overhang risk, file-size efficiency, visual fidelity. The methodology is public and versioned. The output is reproducible from the input mesh.

It's at romanmartins.com/forkable-factory/photo-to-print.

I don't think the right move is to crown a winner. The right move is to make the scoring visible, so the next person who needs to choose a model has the right axis to choose on.

What I'm watching for

The interesting question isn't which model wins today. It's how fast the printability frontier moves once people are watching it.

Two of the models I tested didn't exist eighteen months ago. Three of the dimensions in my rubric will be obsolete the day a model decides to optimise for them. That's how this works.

The benchmark is a forcing function. Once it's public, the gap closes.

The pretty mesh that doesn't print is a metaphor for half the AI conversation right now. Renders look great. Slicing reveals what's actually there.