The Architecture That Wouldn't Scale: What Segment Learned Going Back to Monolith
When three engineers spend their days just keeping the lights on, something has broken that metrics can't measure. Segment's reversal from microservices reveals what happens when architectural ideals meet operational reality.
There's a moment every engineering manager dreads: when you realize your team isn't building anymore. They're just keeping things alive.
At Twilio Segment, that moment arrived quietly. Three full-time engineers, skilled and capable, found themselves spending their days not on feature development, not on innovation, but on the care and feeding of a distributed system that had grown beyond anyone's ability to reason about it. The microservices architecture that was supposed to enable velocity had become the thing that made velocity impossible.
In their 2018 blog post "Goodbye Microservices," Segment engineer Alexandra Noonan documented something rarely seen in our industry: an honest accounting of an architectural decision that didn't work out. Not because the team lacked skill or because microservices are inherently bad, but because the fit between solution and problem had broken down in ways that only became visible over time.
The Logic That Led Here
Segment's customer data infrastructure processes hundreds of thousands of events per second, routing them to over one hundred different destination APIs—services like Google Analytics, Optimizely, and custom webhooks. In the early days, they faced a classic distributed systems problem: head-of-line blocking. When one destination API slowed or failed, retry attempts flooded the queue, delaying delivery across all destinations.
The microservices solution seemed obvious. Create a separate service and queue for each destination. Isolate the failures. If Google Analytics hiccups, why should that affect Optimizely?
According to Noonan's account, it worked. For a while. "This microservice-style architecture isolated the destinations from one another, which was crucial when one destination experienced issues as they often do," she wrote. The architecture delivered on its primary promise: fault isolation.
But architecture decisions have second-order effects.
The Complexity Tax
During a period of hypergrowth around 2016-2017, Segment added over fifty new destinations—roughly three per month. Each destination required custom transformation code. Some were simple mappings. Others involved "shoving values into hand-crafted XML payloads," as Noonan put it with the weary humor of someone who has actually done this work.
Initially, all destination code lived in one repository. When a single broken test blocked all deployments, the team split each destination into its own repo. The isolation helped with test failures. It also meant they eventually managed over one hundred separate repositories, each with its own deployment pipeline, its own dependencies, its own version drift.
Shared libraries were created to reduce duplication. But updating a shared library required testing and deploying changes across all services—a week of developer effort, according to Noonan's QCon London presentation in 2020. Versioning the libraries made updates faster but defeated the purpose of sharing code in the first place.
The operational overhead became crushing. Auto-scaling rules were applied uniformly across all services, despite vastly different resource needs, because customizing each one would have required even more management overhead. The team recognized that proper fault isolation would have meant one microservice per queue per customer—over ten thousand services—a number that made the impossibility clear.
What The Numbers Don't Show
In her talks, Noonan emphasized something that metrics struggle to capture: "If microservices are implemented incorrectly or used as a band-aid without addressing some of the root flaws in your system, you'll be unable to do new product development because you're drowning in the complexity."
This is where the human cost appears. Three engineers maintaining infrastructure instead of building products. A team that can't move fast not because they lack talent but because the system demands constant attention. The opportunity cost of features not built, experiments not run, problems not solved because the architecture itself has become the problem.
The 2017 decision to move back to a monolith—which they named Centrifuge—considered all these trade-offs explicitly. They would lose some modularity. Environmental isolation would decrease. The visibility that came "for free" with separate services would require intentional effort to build.
But they gained something more valuable: the ability to build again.
The Architecture That Fits
Centrifuge now handles billions of messages per day delivered to dozens of public APIs. There's a single code repository. All destination workers use the same version of shared libraries. Deployments take minutes, not hours. Most importantly, according to Noonan, "we were able to start building new products again."
The story doesn't end with "microservices bad, monoliths good." That would be too simple, and it would miss the actual lesson. Segment's experience—and similar reversals at companies like Amazon Prime Video for certain services—suggests that architectural decisions are contextual. They're bets based on assumptions about scale, team size, operational maturity, and the specific problems you're solving.
At QCon, Noonan noted that most architectural decisions are made with the best information available at the time. Only hindsight reveals which assumptions were wrong. But she also pointed out that spending a few days or weeks on deeper analysis could potentially avoid situations that take years to correct.
The Question to Ask
If you're facing the microservices-versus-monolith decision, the relevant question isn't "which architecture is better?" It's "what problem am I actually trying to solve, and what am I willing to pay to solve it?"
Operational overhead is a real cost, paid in engineering time, cognitive load, and opportunity. So is coupling and lack of isolation. The question is which cost you can afford, which aligns with your team's strengths, and which keeps you focused on the problems that matter to your users.
Segment's story is valuable not because it proves monoliths are superior, but because it shows what happens when you're honest about trade-offs. When you measure not just uptime and latency, but whether your team can actually build the things they need to build.
Sometimes the best architecture is the one that gets out of the way.