The Final 10%: What AI Can't Build (And Why It's the Only Part That Matters)

AI tools get you to 90%. But that last 10% — security, authentication, payments, error handling, deployment — is the difference between a demo and a product people pay for.

45% of AI-generated code contains security vulnerabilities. 25% of Y Combinator's Winter 2025 batch was 95% AI-generated. The gap between a working demo and a production-ready product has never been wider — or more expensive to ignore.

You've built something with Cursor. Or Replit. Or Bolt. Or Lovable.

It looks great. The demo is impressive. You've shown it to friends, maybe even potential investors, and everyone says the same thing: "This is amazing. When does it launch?"

Then you try to launch. And everything falls apart.

The login system breaks when two people sign up with the same email. Payments go through but the access doesn't update. Error messages show raw database queries to your users. The app crashes under 50 concurrent users. Someone finds they can access another user's data by changing a number in the URL.

This is the final 10%. And it's where most AI-built products go to die.

The 90/10 Problem

AI coding tools are genuinely remarkable. I use them every day across every build. Cursor, Claude Code, Replit — they've transformed how fast I can ship production software. I built RiskPod's compliance marketplace in 30 days. It generated 550+ signups in its first 48 hours. AI tools were essential to that speed.

But here's what AI evangelists don't tell you: the tools get you to 90% of a working product at extraordinary speed. That last 10% — the part that makes it a product people actually pay to use — is where everything gets hard.

The numbers are sobering. Veracode's 2025 GenAI Code Security Report tested 100 leading LLMs across 80 coding tasks and found they produced insecure code 45% of the time. Not obscure edge cases. Nearly half the time, the code had exploitable security vulnerabilities.

Tenzai's December 2025 assessment compared five major AI coding tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — by building the same three applications with each. Across 15 applications, they found 69 vulnerabilities. Several were rated critical, particularly around API authorisation logic and business logic flaws.

The most revealing finding? The AI tools were good at avoiding the well-known, generic vulnerabilities — things like SQL injection and XSS that have been in every security textbook for 20 years. Where they consistently failed was on context-dependent logic. The things that require understanding how a business actually operates, how users actually behave, and what happens when something goes wrong.

That's the final 10%.

The perception gap: why everyone thinks AI is faster

Before we get into the specifics, it's worth understanding why the final 10% catches so many founders off guard.

A randomised controlled trial by METR — the most rigorous study of AI coding productivity to date — found that experienced developers using AI tools took 19% longer to complete tasks than those working without AI. But here's the critical detail: before starting, those developers predicted AI would make them 24% faster. After finishing (and being measurably slower), they still believed AI had sped them up by 20%.

This perception gap explains a lot. When Spotify announced that their best engineers haven't written a line of code since December 2025, using an internal AI system called Honk built on Claude Code, the headlines screamed "AI replaces coding." What they missed: Spotify's engineers have years of deep codebase knowledge. They can evaluate AI-generated code because they know what "correct" looks like. They didn't skip the final 10% — they applied it through review and direction rather than typing.

When Axios CTO Dan Cox claimed an engineer completed in 37 minutes what used to take three weeks, the same thing happened. What compressed wasn't the thinking. It was the typing. The commodity implementation collapsed. The product judgment — deciding what to build, how the data should flow, what happens at failure boundaries — stayed the same.

If experienced developers at companies like Spotify and Axios can't skip the final 10%, a first-time founder building their first product certainly can't either. The METR study proves that even professionals overestimate AI's contribution. For founders without production experience, the gap between "working demo" and "working product" is invisible — until users find it.

What's Actually in the Final 10%

The final 10% isn't one thing. It's ten overlapping categories of problems that AI tools consistently fail to handle correctly. I've seen every one of these across 100+ production builds over 18 years — and I see them more frequently now that founders are arriving with vibe-coded prototypes that need to become real products.

1. Authentication and Access Control

AI tools can generate a login form in seconds. They can wire up email/password authentication, add a "forgot password" flow, and make it look professional.

What they consistently get wrong: session management, token refresh logic, role-based access control, account enumeration protection, brute force prevention, and the hundred edge cases that emerge when real users interact with auth systems.

I've reviewed AI-generated auth systems where changing a user ID in the URL gave you access to someone else's account. Where the "admin" role was checked client-side in JavaScript — meaning anyone could grant themselves admin access through the browser console. Where password reset tokens never expired.

These aren't theoretical risks. A startup called Enrichlead launched with AI-generated code that put all security logic on the client side. Within 72 hours, users discovered they could bypass the paywall by changing a single value in the browser console. The founder couldn't audit 15,000 lines of AI-generated code. The project shut down entirely.

What production-ready looks like: Server-side session validation on every request. Cryptographically secure tokens with proper expiry. Role-based access control enforced at the API layer, never in the browser. Rate limiting on login attempts. Account lockout policies. Proper password hashing with bcrypt or argon2.

2. Payment Processing

Stripe's documentation is excellent, and AI tools can generate basic payment integration that appears to work. You can prompt Cursor to "add Stripe payments" and get a checkout flow in minutes.

What breaks: webhook handling, subscription lifecycle management, failed payment recovery, proration, refunds, tax calculation, currency handling, and the dozens of edge cases that occur when real money is involved.

The most common AI payment failure I see is the "optimistic update" — the app grants access the moment the user clicks "Pay," before the payment has actually been confirmed by Stripe's webhooks. In testing, this works perfectly. In production, it means users get free access when payments fail, cards are declined, or network requests time out.

Another common failure: AI-generated code that stores Stripe API keys in client-side JavaScript. I've seen this in at least a dozen vibe-coded apps. The keys are visible to anyone who opens browser developer tools.

What production-ready looks like: Webhook-driven access control — users only get access after Stripe confirms the payment succeeded. Proper handling of subscription states: active, past_due, canceled, unpaid. Failed payment retry logic. Server-side API key storage with environment variables. Idempotent webhook processing to handle Stripe's retry behaviour.

3. Error Handling and Resilience

The happy path is easy. AI tools excel at building the flow where everything works perfectly. The user fills in the form, clicks submit, and sees a success message.

Production isn't the happy path. Production is: the user submits the form with invalid data. The database is temporarily unavailable. The email service returns a 503. The payment provider times out. The user double-clicks the submit button. The user navigates away mid-submission and comes back.

AI-generated code typically has one of two failure modes: it shows the user a raw error message (including database schema details, API keys, or stack traces), or it silently fails and the user has no idea what happened.

Neither is acceptable in production.

What production-ready looks like: Structured error handling at every layer — API, database, third-party services. User-friendly error messages that explain what happened and what to do next. Retry logic with exponential backoff for transient failures. Circuit breakers for external service dependencies. Graceful degradation when non-critical services are down.

4. Security Beyond Authentication

Authentication is one layer of security. Production-ready software needs many more.

The OWASP Agentic AI Top 10, published in 2026, identifies critical risks specific to AI coding agents. These include excessive agency (AI agents performing actions beyond their intended scope), data leakage, and insecure deserialization. The framework exists because these risks are already materialising in production environments.

Real incidents from 2025-2026: an AI-generated script on Replit deleted an entire production database despite explicit instructions not to touch production. A vulnerability in Anthropic's MCP server allowed reading and writing arbitrary files. A startup's AI-generated code hardcoded API keys that were subsequently exploited by attackers. A prompt injection placed in source code comments prompted Windsurf to automatically store malicious instructions in its long-term memory.

Palo Alto's Unit 42 found that most organisations allow employees to use vibe coding tools but very few have performed formal risk assessments on their use.

What production-ready looks like: Input validation on every endpoint. Content Security Policy headers. CORS configuration. Rate limiting. SQL parameterisation. Dependency auditing. Secret management through environment variables, never hardcoded. Regular security scanning. HTTPS everywhere. Proper CSRF protection.

5. Deployment and Infrastructure

The "Deploy" button on Replit or Vercel makes deployment feel trivially easy. Click, and your app is live.

For a demo, that's fine. For a product handling real users, real data, and real money, you need: proper environment separation (development, staging, production), database backup and recovery, SSL certificate management, domain configuration, CDN setup, environment variable management, CI/CD pipelines, and rollback procedures.

I've seen vibe-coded apps deployed to Replit's free tier handling paying customers' financial data. No backups. No environment separation. No monitoring. The founder didn't know that Replit's free deployment could go to sleep after inactivity, meaning paying customers would hit a loading spinner for 30 seconds while the server woke up.

What production-ready looks like: Proper hosting with uptime SLAs. Automated database backups with tested recovery procedures. Environment separation so development bugs never touch production data. Automated deployment pipelines with rollback capability. SSL/TLS configuration. Custom domain with proper DNS.

6. Data Integrity and Validation

AI tools build database schemas quickly. They'll create tables, relationships, and basic CRUD operations. What they consistently miss: data validation at the API layer, unique constraints, referential integrity, migration strategies, and the business rules that determine what constitutes valid data.

The result is apps that accept impossible data: negative prices, dates in the past for future appointments, email addresses without @ signs, phone numbers with 3 digits. Each one is a bug that erodes user trust and creates downstream problems.

For multi-tenant SaaS — where multiple businesses share the same platform — data isolation is critical. One customer must never see another's data. This requires row-level security, tenant-scoped queries, and careful API design. It's exactly the kind of context-dependent logic that AI tools fail at most, because it requires understanding how businesses operate, not just how code runs.

What production-ready looks like: Server-side validation on every input. Database constraints that enforce business rules. Proper migration tooling for schema changes. Row-level security for multi-tenant applications. Audit trails for sensitive data changes.

7. Performance Under Real Load

AI-generated code works perfectly with one user. It works fine with ten. At a hundred concurrent users, you start seeing problems. At a thousand, the app falls over.

The reason: AI tools don't think about performance. They generate the simplest code that produces the correct output. That means N+1 database queries (loading 100 items by making 101 separate database calls instead of 1), unoptimised images, missing database indexes, no caching, and API endpoints that return entire database tables when the frontend needs three fields.

These problems are invisible during development. They only appear under real load — which is why they catch founders by surprise at the worst possible moment.

What production-ready looks like: Database query optimisation with proper indexing. API pagination for large datasets. Image optimisation and lazy loading. Caching strategies for frequently accessed data. Load testing before launch to identify bottlenecks.

8. Monitoring and Observability

When something goes wrong in production — and it will — you need to know about it before your users tell you. AI-generated code has no monitoring. No error tracking. No alerting. No logging beyond console.log statements that disappear when the server restarts.

This means founders discover problems through angry customer emails, not through dashboards. By the time they know something is broken, users have already had a bad experience.

What production-ready looks like: Error tracking (Sentry, LogRocket, or similar). Uptime monitoring with alerting. Application performance monitoring. Structured logging that persists across deployments. Health check endpoints. Database monitoring.

9. Compliance and Legal Requirements

If your app handles personal data, you have legal obligations. GDPR in the UK/EU. State-level privacy laws in the US. Industry-specific regulations for finance, healthcare, or education.

AI tools don't consider compliance. They'll store personal data indefinitely, share it with third-party services without consent, fail to implement data deletion capabilities, and skip cookie consent mechanisms. None of these are optional.

What production-ready looks like: Privacy policy and terms of service. Cookie consent mechanism. Data deletion capability (right to erasure). Data processing agreements with third-party services. Appropriate data retention policies. Consent management for marketing communications.

10. Edge Cases and Business Logic

This is the category that encompasses everything else — and it's the one that requires the most human judgment.

What happens when a user cancels their subscription mid-billing cycle? When two users try to book the same appointment simultaneously? When the timezone changes during daylight saving time? When a user has special characters in their name? When the app is used in a country with right-to-left text? When a user's session expires while they're filling in a long form?

AI tools handle the happy path. The final 10% is every other path — and there are hundreds of them. Each one requires understanding the business context, the user's expectations, and the correct way to handle the situation.

Tenzai's research found that AI-generated code was most prone to business logic vulnerabilities. Their researchers noted that AI agents lack the intuitive understanding that helps human developers grasp how workflows should operate.

What production-ready looks like: Comprehensive edge case handling based on real-world usage patterns. Business rules encoded in the application logic, not assumed. Proper timezone handling. Character encoding support. Concurrent request handling. Session management for long-running operations.

The code quality data backs this up

GitClear's analysis of 211 million lines of code across five years found that as AI coding tool adoption grew, so did the warning signs. Code duplication increased 4-8x. Code churn — code that gets rewritten within two weeks of being written — rose significantly. And perhaps most telling, code reuse and refactoring declined. Developers are writing more new code and cleaning up less of the old.

Short-term, the code looks fine. Long-term, it's accumulating technical debt at unprecedented scale. GitClear's research found that short-term bug rates dropped (AI code looks clean), but 6-month defect rates increased by 12%. The problems don't surface in demos. They surface in production, weeks or months after launch.

This is the final 10% at a macro level: AI is accelerating code generation while eroding code quality. Speed up, quality down. The gap between them is where products fail.

Why AI Can't Fix This (Yet)

The instinctive response to these problems is: "Just prompt the AI to fix them." And for some issues, that works. You can ask Cursor to add input validation or implement rate limiting, and it'll generate reasonable code.

But here's the fundamental problem: the AI doesn't know what it doesn't know.

A SoftwareMill experiment in late 2025 vibe-coded a web application, then asked the AI to harden it for production. The AI identified 15 critical issues and proposed fixes. After 20+ agentic iterations taking approximately 3 hours, the AI declared the application "production-ready."

It wasn't. Multiple OWASP vulnerabilities remained — outdated dependencies, loose Content Security Policies, export functionality vulnerable to XSS. The AI couldn't identify what it had missed because it lacked the context to understand what production-ready actually means for that specific application.

This is the core problem. Production readiness isn't a checklist that AI can run through. It's a judgment call that depends on: what kind of data you're handling, who your users are, what regulations apply, how the business operates, what failure modes are acceptable, and what the consequences are when something goes wrong.

Those are human decisions. They require experience — specifically, the experience of having shipped products that handle real users, real money, and real consequences.

The Real Cost of Skipping the Final 10%

Founders who skip the final 10% pay for it in three ways.

Security breaches. In May 2025, the AI coding platform Lovable was found to have security vulnerabilities in 170 out of 1,645 applications built on the platform — allowing personal information to be accessed by anyone. That's roughly 1 in 10 apps with exploitable vulnerabilities. If your app handles customer data and you're running AI-generated security, the odds aren't in your favour.

Lost revenue. When payments break, when users can't log in, when the app crashes under load — you don't just lose the transaction. You lose the customer. And in the early days of a product, every customer matters.

Rebuilds. The most expensive outcome: a founder spends 3-6 months building with AI tools, launches to real users, discovers the product can't handle production conditions, and has to rebuild from scratch. The time and money spent on the prototype is wasted. I see this pattern every month.

How to Actually Close the Gap

The final 10% isn't about writing more code. It's about applying judgment to the code that exists. Here's how I approach it across every build.

Start with a proper specification. Most AI coding failures start with vague prompts. "Build me a booking app" produces a demo. A detailed specification — user roles, data model, business rules, edge cases, security requirements — produces a product. This is why I built BuildKits: to help founders create specifications that give AI tools (and human builders) the clarity they need.

Use AI for speed, humans for judgment. I use AI coding tools across every build. They're extraordinary for generating boilerplate, building UI components, writing CRUD operations, and handling the repetitive parts of development. But every line of generated code gets reviewed through the lens of 18 years and 100+ production builds. The AI writes fast. I decide what's correct.

Test with production conditions, not demo conditions. Before any product launches, it needs to survive: multiple concurrent users, failed payments, network timeouts, invalid input, and deliberate misuse. These aren't theoretical scenarios — they're the first 48 hours of any real product launch.

Build the boring infrastructure first. Authentication, payments, error handling, monitoring, deployment — none of this is exciting. All of it is essential. The 30-day build process front-loads this infrastructure so the product is production-grade from day one, not retrofitted after launch.

Get a security review before launch. If you've built with AI tools and you're about to launch, get someone with production experience to review the code. The £5,000 Discovery Sprint includes a full technical assessment that catches the problems AI introduced. It's the cheapest insurance you can buy.

Who Should Read This

If you've built something with AI coding tools that works in demo but you're nervous about launching it to real users — this series is for you.

Over the coming weeks, I'm publishing detailed guides on each of the ten areas above: authentication, payments, error handling, security, deployment, data integrity, performance, monitoring, compliance, and edge cases. Each guide covers what AI gets wrong, what production-ready looks like, and how to close the gap.

If you're a non-technical founder who's built 80-90% of a product with Cursor, Replit, Bolt, or Lovable and you've hit the wall — you're exactly who I built Hello Crossman for.

I don't replace AI tools. I apply 18 years of production experience to the code they generate. The result is the same product you've been building — but hardened, secured, and ready for real users.

The AI got you to 90%. Let's finish the job.

Frequently Asked Questions

What is the final 10% in AI-coded software?

The final 10% refers to the production-critical components that AI coding tools consistently fail to implement correctly: authentication and access control, payment processing, error handling, security hardening, deployment infrastructure, data integrity, performance optimisation, monitoring, compliance, and edge case handling. These are the elements that separate a working demo from a product people can safely pay to use.

Can I fix vibe-coded security issues by prompting the AI to fix them?

Partially, but not reliably. AI tools can address specific, well-defined security issues when prompted. However, research shows the AI doesn't know what it's missed. In one experiment, an AI declared code "production-ready" after 20+ iterations while multiple OWASP vulnerabilities remained. Production security requires human judgment about what matters for your specific application, users, and data.

How much does it cost to make a vibe-coded app production-ready?

It depends on how much needs to change. If the architecture is sound and the issues are primarily security hardening and infrastructure, expect £5,000–£15,000. If the fundamental architecture needs rebuilding — client-side security logic, no proper database design, no API layer — it can cost as much as building from scratch. A Discovery Sprint (£5,000) gives you an honest assessment before you commit to the full investment.

Is vibe coding bad?

No. Vibe coding is extraordinary for prototyping, validation, and building the 90% of any product that is straightforward. The problems emerge when vibe-coded prototypes are launched as production products without the final 10% being addressed. AI tools are a force multiplier for experienced builders. They're a risk multiplier for people who don't know what production-ready means.

What AI coding tools are best for production-ready software?

Cursor, Claude Code, and Replit are all capable of producing production-quality code — when guided by someone who knows what production-ready looks like. The tool matters less than the judgment applied to its output. I use multiple AI tools across every build, but the production quality comes from 18 years of knowing what to check, what to harden, and what the AI got wrong.