The Vibe Coding Reality Check: What Every Founder Needs to Know Before Building with AI

Practical insights on using AI tools to build production-ready software faster.

AI coding tools get you 80% of the way fast. The remaining 20% is where products live or die. Here is what every founder needs to understand about the gap between prototype and production.

Vibe coding changed everything. A non-technical founder could describe an app in plain English and watch it materialise in minutes. Cursor, Replit, Bolt, Lovable — the tools got good enough that "I built this over the weekend" became a legitimate claim.

And then the hangover hit.

Founders who shipped prototypes started discovering their apps couldn't handle real users. Security vulnerabilities that AI tools don't flag. Performance that degrades from demo to production. Business logic that works for the happy path and breaks on everything else.

I've now rescued or rebuilt more than a dozen products that started as vibe-coded prototypes. The pattern is consistent enough that I can map it. Here's what every founder needs to understand about building with AI tools — what works, what doesn't, and where the real danger lies.

The promise was real (for about 80% of the journey)

Let me be clear: AI coding tools are genuinely transformative. I've shipped 100+ products and I use them daily. They handle boilerplate brilliantly. They scaffold applications faster than any human. They turn a specification into a working frontend in hours rather than weeks.

The problem isn't the tools. The problem is what happens after the prototype looks good in a demo.

Andrej Karpathy, who coined the term "vibe coding" in early 2025, recently reframed it as "agentic engineering" — acknowledging that the casual, vibes-based approach needs the discipline of engineering to produce anything that lasts. That reframing matters. The tools work. The approach of "prompt and pray" doesn't.

The 80/20 split that nobody warns you about

AI gets you 80% of the code quickly. That 80% looks impressive — screens render, buttons work, data flows. It's enough to demo to investors, show friends, even convince yourself you're nearly done.

The remaining 20% is where products live or die. That 20% includes:

Error handling. What happens when the API times out? When a user submits invalid data? When the database connection drops? AI tools build for the happy path. Production means handling every unhappy path too. I wrote a full breakdown in Error Handling and Edge Cases: The Invisible Work That Makes Software Feel Professional.

Security. Research from Veracode found that 45% of AI-generated code contains security vulnerabilities. AI tools don't think adversarially — they don't consider what happens when someone tries to break the system. Authentication, authorisation, input sanitisation, rate limiting — these require deliberate security thinking. The AI app security checklist covers the specifics.

Performance at scale. An app that loads instantly with 10 test users can crawl with 1,000 real ones. Database queries that are fine for small datasets become bottlenecks. AI tools optimise for "working" not "fast at scale."

Edge cases. Real users do things prototypes never anticipated. They submit forms with special characters. They navigate backwards in unexpected ways. They use the product on devices and browsers you never tested. Each edge case requires specific handling that AI tools rarely anticipate.

The METR study that should make every founder pause

In a randomised controlled trial, METR found that experienced developers using AI coding tools were actually 19% slower than those coding without them — despite believing they were 20% faster.

Read that again. The developers felt more productive while being measurably less productive.

This isn't because the tools are bad. It's because the tools create a false sense of progress. When code appears quickly, you assume the hard work is done. But the hard work — understanding what the code should do, catching what it shouldn't do, and ensuring it works under every condition — that's the part AI doesn't handle.

For non-technical founders, this effect is even more pronounced. Without the experience to evaluate what AI produces, every generated file looks like progress. The gap between "this looks done" and "this is done" can be months of additional work.

The complexity ceiling

Every vibe-coded project hits a complexity ceiling. The community has named this "the 6-month wall" — the point where the codebase becomes too tangled for AI tools to navigate reliably.

It happens because AI coding tools don't maintain architectural coherence. Each prompt generates a local solution without considering the broader system. After thousands of prompts, you have a codebase where:

Different parts use different patterns for the same thing. Dependencies are tangled in ways that make changes in one area break others. There's no consistent error handling strategy. Authentication is implemented differently across features. The database schema has evolved without a coherent data model.

This is what developers call "technical debt" — and AI tools are producing it at unprecedented speed. A codebase that would take a human team two years to make unmaintainable can reach that state in months with AI tools.

I've written about the specific technical failures in Why Your Vibe-Coded App Breaks in Production and The "AI Destroyed My Codebase" Problem.

What vibe coding is actually good for

Despite everything above, I use AI tools extensively. The key is knowing where they excel and where they don't.

AI tools excel at: Scaffolding new projects. Building UI components. Writing standard CRUD operations. Generating boilerplate. Converting designs into code. First-draft implementations of well-defined features. Refactoring existing code with clear instructions.

AI tools struggle with: Novel business logic. Security hardening. Performance optimisation. System architecture decisions. Cross-cutting concerns (logging, monitoring, error handling applied consistently). Data model design. Production deployment configuration.

The distinction maps directly to the difference between execution and judgment. AI handles execution brilliantly. Judgment — knowing what to build, why, and how to ensure it works under real conditions — requires human expertise.

I covered the full comparison in What You Still Need a Human For.

The right way to build with AI tools

The answer isn't to avoid AI tools. It's to use them within a framework that compensates for their weaknesses.

Start with architecture, not prompts. Define your data model, user flows, and system architecture before generating any code. AI tools are excellent at implementing a clear plan. They're terrible at creating one.

Write specifications, not wishes. I use a system called BuildKits — structured specifications designed for AI execution rather than human persuasion. The specification tells the AI exactly what to build, with what constraints, handling what edge cases. The result is dramatically better than loose prompting. How to Write Specifications That AI Coding Tools Actually Follow covers the approach.

Build frontend first. Get every screen, every user flow, and every state built before touching the backend. This forces product decisions upfront and creates a visual specification that's impossible to misinterpret.

Treat AI output as a first draft. Every piece of generated code needs review. Not "does it look right?" but "does it handle failure? Is it secure? Will it perform at scale?" This requires someone who knows what production-ready code looks like.

Production-harden deliberately. The production-ready checklist includes 30 items that prototypes typically miss. Authentication, error handling, input validation, monitoring, backup, security headers — each one requires deliberate attention.

When to stop prompting and get help

There are clear signals that a vibe-coded project needs professional intervention:

The same bugs keep reappearing after fixes. New features break existing ones consistently. Performance degrades with each update. You're spending more time debugging than building. The codebase feels "fragile" — you're afraid to change anything. You've been "almost done" for weeks or months.

If you recognise these signals, the rescue playbook walks through the options — from targeted fixes to structured rebuilds. I've also written about when to stop prompting and start building properly.

The bottom line

Vibe coding is a prototype tool being used for production work. That mismatch is causing founders to waste months and thousands of pounds on products that look done but aren't.

The tools are getting better — rapidly. But the gap between "AI can generate code" and "AI can build a production-ready product" is still significant. That gap is filled by product judgment, engineering discipline, and the kind of pattern recognition that comes from shipping at scale.

If you're building something that real users will pay for, you need both: the speed of AI tools and the judgment of experienced product development. That combination — what I call agentic engineering rather than vibe coding — is what produces products that actually work.

For a comparison of the major tools and when to use each, see Cursor vs Replit vs Bolt vs Lovable. For understanding the realistic timeline from prototype to production, see From Cursor Prototype to Production SaaS.

---

Tom Crossman builds production-ready software at Hello Crossman. 18 years in product development. 100+ products shipped. The products that survive aren't the ones built fastest — they're the ones built right. See the case studies →