Build a Full App with Claude Code: Real Examples

A builder with zero game development background shipped a GTA-style game running on real Google Earth cities in a single weekend. Claude wrote roughly 80% of the code. The game pulls real police stations, airports, and ports from OpenStreetMap. In-car radio auto-tunes to local stations via Radio Garden API.

That is not a demo. It is a live waitlist at cw.naveen.to.

The pattern behind that build, and several others like it, is specific and repeatable. This post walks through what the successful ones did and what went wrong when they did not.

What you can actually build in a weekend

Start with three real builds from April 2026. The range matters.

Naveen (@naveenvkt on X) shipped Crimeworld: a browser-based GTA-style game on real Google Earth cities. Stack: Cesium, Google 3D Tiles, Three.js, Radio Garden API, and OpenStreetMap data. No game development background. Claude wrote roughly 80% of the code. The build happened over a single weekend. It is functional, it is glitchy in places, and it has a live waitlist.

Santiago built career-ops: an AI job search system with no traditional application code at all. The entire system is approximately 3,200 lines of markdown prompt files. Fourteen skill modes live in a modes/ directory. CLAUDE.md is the orchestration layer. The system evaluated 740+ job listings, generated 100+ ATS-optimized CVs, and landed a Head of Applied AI role. It uses parallel batch processing with claude -p workers for sub-agents. Human review before every submission is baked into the design. The system scores jobs A through F and recommends against applying below 4.0/5. Source code is MIT-licensed at santifer/career-ops on GitHub.

A third builder with no finance background tested a hedge fund parking lot counting strategy using satellite imagery. They described the hypothesis in plain English: optical satellite images of retailer parking lots should predict earnings. Claude built the entire analysis pipeline from that description. When the optical approach failed at roughly 33% accuracy, the builder described a new hypothesis in plain English: radar reflects off metal vehicles differently than asphalt. Claude rebuilt the pipeline from scratch. The final system pulls Sentinel-2 optical and Sentinel-1 radar data via Google Earth Engine, processes parking lot boundaries from OpenStreetMap, and runs permutation tests, binomial tests, and bootstrap resampling across 35+ Python scripts. Radar at three retailers hit 100% accuracy. Scaled to ten retailers it dropped to 50%. The builder's conclusion: "the moat is data quality, not the algorithm." That is the kind of honest result you get when you build to test instead of building to prove.

Three different domains. Three different technical backgrounds. One consistent approach. All of them described what they wanted in plain English, broke the work into pieces, and let Claude do the implementation.

Before you write a single prompt

The spec comes first. Always.

Official Anthropic guidance for Claude Code: let Claude interview you before writing any code. Ask Claude to ask clarifying questions about technical implementation, UI/UX, edge cases, constraints, and tradeoffs. Write the answers to a SPEC.md file. Start a fresh session to execute from that spec. The WorthIt builder (a junior dev in a bootcamp, primarily backend Java, no prior full-app experience) described it this way: "detailed prompts beat vague ones every time, full product specs, not 'build me a dashboard.'"

CLAUDE.md is not project documentation. That distinction matters more than most guides let on.

The standard advice is to fill CLAUDE.md with tech stack descriptions, conventions, and notes about your folder structure. That approach has a real failure mode. Everything in CLAUDE.md rides at high priority in Claude's context. A bloated CLAUDE.md causes Claude to lose specific instructions. If Claude keeps doing something you don't want despite a rule you've written down, the file is probably too long and the rule is getting buried.

The more powerful use of CLAUDE.md is as an orchestration control file. Define operational workflows and delegation patterns. Move tech stack documentation and coding conventions into skills that load on demand. Keep CLAUDE.md focused on the things Claude cannot figure out from reading your code: bash commands, branch naming conventions, testing instructions, architectural decisions you made for non-obvious reasons, developer environment quirks.

The official test for every line in CLAUDE.md: would removing this line cause Claude to make a mistake? If not, cut it.

Use /init to generate a starter CLAUDE.md from your existing codebase before writing anything from scratch.

The instruction priority hierarchy for structured projects is: CLAUDE.md (high, always loaded) above rules directories (high, path-filtered) above skills (medium, loaded on demand) above file contents (standard, when read). The sweet spot for a structured project is 200 to 400 lines of operational rules, not 60 lines of project trivia.

SESSION.md is the other piece most builders skip. Maintain a separate file tracking what was accomplished, current status, open items, and key files modified. Update it before ending every session. Start new sessions with "Resume project X." Record why you decided to implement or not implement a feature, with dates. SESSION.md is what makes context overflow a manageable event instead of a project-ending one.

The weekend build process

Every successful full-app build used the same rhythm: one feature per session, /clear between features.

Context bleed from previous features causes incorrect assumptions and subtle bugs. When you finish a feature and start the next one, Claude carries forward assumptions about what you were doing. Those assumptions are often wrong for the new feature. /clear resets the context and gives you a clean starting state.

Plan Mode before every non-trivial feature. Shift+Tab twice or /plan. Claude researches your codebase and outlines what files it will create, what functions it will introduce, what edge cases exist. Review and refine the plan before execution. Always add "List any assumptions" to your plan prompt. Assumptions Claude makes silently are where bugs come from.

What goes in a good prompt:

Scope: what this feature does and explicitly what it does not do
Constraints: which files are off limits, which patterns to follow
An example file: point to an existing file that follows the shape you want
Success criteria: specific, testable, not "make it work"

The success criteria piece is worth emphasizing. Instead of "implement email validation," write "Write a validateEmail function. Test cases: user@example.com is true, invalid is false, user@.com is false. Run the tests after implementing." Without success criteria, you are the only feedback loop.

Always review the diff before accepting any output. Check specifically for file deletions, public API renames, off-limits file changes, new entries in package.json, and schema migrations. Claude can rename public API endpoints across your codebase without warning. Every client that depended on those endpoints will break. The diff is where you catch it.

Context management: the skill that separates good builds from abandoned ones

Understanding what fills your context window is what separates builds that finish from builds that drift into confusion.

The effective context window for Claude Code is around 200K tokens, but roughly 20K are reserved for compaction output. Auto-compact triggers when about 13,000 tokens remain. A warning threshold appears around 20,000 tokens. Manual compact is blocked below 3,000 tokens. These numbers come from analysis of the Claude Code source code.

Three official commands:

/clear: reset context between unrelated tasks. After two failed corrections on the same issue, stop trying to correct and use /clear with a better initial prompt instead.
/compact <instructions>: summarize with custom focus. Example: /compact Focus on the API changes keeps the relevant context and drops the rest.
Esc + Esc or /rewind: roll back to before things went wrong.

For investigation tasks, use subagents. Run them in separate context windows, have them report back summaries. Your main session stays clean. The Writer/Reviewer pattern works the same way: use a fresh session to review code your main session just wrote. Fresh context catches things a tired context misses.

Three anti-patterns from the official docs:

The kitchen sink session: one task, then something unrelated, then back to the first. Context fills with irrelevant data. Fix: /clear.

Correcting over and over: failed approach, you correct, it still fails, you correct again. After two corrections on the same issue: /clear and write a better initial prompt.

Infinite exploration: ask Claude to "investigate" without scope and it reads hundreds of files. Your context fills with code you do not need for this feature. Fix: scope the investigation narrowly, or use a subagent.

SESSION.md handles the longer-term memory problem. After roughly 30 messages, Claude may reference decisions that were never made, files that were mentioned differently, or code that was modified earlier in the session. The session starts confusing what actually happened. SESSION.md as external state, combined with /clear between features, keeps this from derailing a build.

The failure modes nobody warns you about

These are documented, specific, and all sourced from real projects.

Ghost files. GitHub issue #26771 on the official Claude Code repository: Claude Code confidently reports creating files that were never written to disk. The tool output claims success. git status tells a different story. Root causes include permission errors, path resolution failures, and dropped tool calls. The fix is simple: always verify with git status after file operations. Do not assume the file exists because Claude said it does.

API method hallucination. Claude generated code using prisma.$upload(). That method does not exist in Prisma. Other documented examples: inventing configuration options (trustProxy as a rate-limiter option), inventing package names that do not exist on npm. The fix: run your tests. Do not assume generated code works because it looks reasonable.

The over-engineering trap. A builder with 100+ sessions of project history asked Claude to fix a bug where approval popups sometimes did not appear. Claude's proposed fix: save approval state to disk so it survives crashes. The problem: the agent cold-resumes from session log on crash, so the approval state would not be picked up anyway. A completely useless fix that looked like good engineering. The question to ask before accepting any complex proposal: does this problem actually need solving?

Silent API renames. Claude renamed public API endpoints across a codebase without warning. Every client that depended on those endpoints broke. Fix: always review the diff before accepting. Check specifically for public API renames and file deletions.

Mobile layouts. Claude generates solid desktop layouts. Mobile views routinely have overlapping elements, text overflow, and cramped spacing. Claude has no mechanism to test on real devices. Fix: check every layout on a real phone before marking it done.

Context false memories. After roughly 30 messages, Claude may reference decisions that were never made, files that were mentioned differently, or code that was modified since the conversation started. The session begins confusing what actually happened. Fix: SESSION.md as external state, and /clear between features.

The 80/20 problem

You can get 80% of a working app in a weekend. The remaining 20% is where most vibe-coded projects quietly fail.

What fails silently after the initial build: app store compliance, real-device performance, auth flow security, architecture decisions (no tests, no CI/CD), and business logic edge cases. This is not a theoretical observation. A company with millions of users built with AI acceleration had five engineers doing AI code review for months. Their conclusion: "Even Claude plus a competent developer would look at this code and say 'yeah, that's fine.' It wasn't fine."

The WorthIt builder ran a dedicated security hardening pass after launch and found input validation gaps, missing error handling, and no XSS protection. Security was not automatic. Columbia DAPLab research identifies error handling and business logic as the most serious and common silent failure patterns in AI-generated code.

The practical fix: treat security as part of every feature, not a post-launch step. Run a security pass after every major feature ships. Check mobile layouts on a real device before marking anything done. Add tests to your success criteria from the start, not as an afterthought.

This is not an argument against building with Claude Code. It is an argument for being honest about what the tool handles automatically and what it does not.

The structured alternative

The builds described above are all ad-hoc: one person, one Claude Code session, building feature by feature. That approach works and has real limits.

The limits show up predictably. Security is not automatic. Quality gates require discipline to run every time. Post-launch infrastructure (error monitoring, performance optimization, security scanning) has to be built or set up separately for each project. The over-engineering trap and the ghost file problem require manual vigilance.

Build This Now is what the best vibe-coders are building for themselves. It packages the orchestration, quality gates, and post-launch infrastructure as a ready-made system. The 14 post-launch commands (/security, /pentest, /performance, /audit) handle the work individual builders discover they need after launch. Row-level security on every database table is the default, not an afterthought. The Quality Gate and Build Fixer agents catch failures that look complete in Claude's output but fail in practice. The Planner agent separates architectural decisions from implementation, with three specialists analyzing features simultaneously before any code is written.

The ad-hoc approach is free and works for learning. The structured approach is what scales. buildthisnow.com has the details.

The weekend builds above are not exceptional cases. They are what happens when you spec before you prompt, clear context between features, review every diff, and treat security as part of the build. The range goes from a GTA clone on Google Earth to a job search system with no application code. The practices that made them work are the same.