The Real Cost of Coding with LLMs: What Works, What Drains You
LLMs are now standard developer tools, but the honeymoon phase is over. Here's what actually works after thousands of hours of real-world use.
You're probably using an LLM to write code. The question isn't whether to adopt these tools anymore—it's how to use them without burning out.
Developers who've spent months building production systems with LLMs are reporting something unexpected: the tools work brilliantly until they don't. And when they don't, the cognitive drain is real.
The Shift Nobody Talks About
Stavros Stavropoulos has built entire production systems with LLMs—a personal assistant that manages his calendar, a voice note pendant, even an art piece masquerading as a wall clock. His revelation? "I thought that I liked programming, but it turned out that what I like was making things, and programming was just one way to do that," he writes on his blog.
His engineering skills didn't become obsolete. They shifted. He no longer needs to know how to write code correctly line-by-line. Instead, system architecture and making the right technical choices matter massively more.
Simon Willison, who's been documenting these changes, calls this new practice "agentic engineering"—developing software with coding agents that can write and execute code in a loop until a goal is met. His definition cuts through the hype: "Agents run tools in a loop to achieve a goal."
When LLMs Become Exhausting
Tom Johnell describes the darker side: "Some days I get in bed after a tortuous 4-5 hour session working with Claude or Codex wondering what the heck happened."
He's identified the doom loop. You're tired, so your prompts degrade. Worse prompts produce worse code. You interrupt the LLM mid-stream to add missing context. The feedback cycle slows to a crawl. Context windows bloat. The AI gets dumber or starts hallucinating about recent experiments.
Johnell calls it "doom-loop psychosis." And if you've worked with LLMs for more than a few weeks, you recognize it immediately.
What Actually Works
Use Multiple Models
Stavros is adamant: your tooling needs to support multiple models from different companies. "Most first-party harnesses (Claude Code, Codex CLI, Gemini CLI) will fail this, as companies only want you to use their models, but this is necessary," he writes.
Different models excel at different tasks. Lock yourself into one provider's ecosystem and you're handicapping yourself.
Recognize Your Mental State
Johnell's rule: "If I reach the point where I am not getting joy out of writing a great prompt, then it's time to throw in the towel."
Watch for these signals:
When you spot these, stop. The AI isn't broken. You are.
Fix Slow Feedback Loops First
Johnell had a parsing problem where each iteration took 15-20 minutes. Context bloated, results degraded, frustration mounted.
His solution: make the feedback loop itself the problem to solve. Start a new session specifically to reproduce the failure case in under five minutes. The AI will optimize the code path and create levers for faster iteration.
Sound familiar? It's test-driven development. Johnell admits he was always the scrappy engineer who skipped elaborate tests. With LLMs, that scrappiness kills productivity. "If you give an LLM clear success criteria," he notes, the AI will not only solve the problem but consume less context and stay smarter.
Understand Architecture Deeply
Stavros reports he's "never even read most" of the code in his projects, yet he's "intimately familiar with each project's architecture and inner workings."
On projects where he lacks domain knowledge (mobile apps), code quickly becomes a mess. On projects where he knows the technology well (backend apps), he maintains tens of thousands of lines with low defect rates.
The pattern is clear: LLMs don't replace technical understanding. They amplify it.
The Skills That Matter Now
Willison frames it well: "Writing code has never been the sole activity of a software engineer. The craft has always been figuring out what code to write."
Every software problem has dozens of solutions with different tradeoffs. Your job is navigating those options. The LLM executes. You architect, specify, verify, and iterate.
According to Willison, the new skillset includes:
LLMs don't learn from past mistakes. But your coding agent can, if you design it to.
The Emerging Reality
This isn't vibe coding—a term Andrej Karpathy coined in February 2025 to describe prompting LLMs while you "forget that the code even exists." That might work for prototypes, but production systems require something different.
Willison distinguishes carefully: "We need a term to describe unreviewed, prototype-quality LLM-generated code that distinguishes it from code that the author has brought up to a production ready standard."
The developers getting results aren't abandoning code review. They're reviewing at a different level—architecture instead of syntax, system design instead of function implementation.
What You Can Do Today
Start tracking your mental state. Before you submit a prompt, ask yourself if you're confident it will work. If not, you haven't thought through the problem.
Identify your slowest feedback loop. Whatever takes the longest to verify is killing your productivity. Make speeding it up your next project.
Pick one domain and go deep. Don't try to use LLMs across every technology. Choose an area you know well and let the AI amplify that expertise.
Set up multiple models. If you're locked into one provider's harness, you're leaving capabilities on the table. Use tools that support model switching.
Stop when you're tired. Seriously. The code you generate while exhausted will cost you more time tomorrow than you save today.
The honeymoon phase of LLM-assisted development is over. What's emerging is better: a mature understanding of when these tools work, when they drain you, and how to tell the difference. The developers who figure this out aren't just writing more code. They're building things they couldn't have built before.