Years ago, I worked on a team getting ready to do our first migrations from the ol’ data center into AWS. We had big dreams of everything being “hands-off”, fully automated — one click from code review to production deployment.
We also had reality, in which neither our dev nor our ops teams knew how to ship without lots of manual intervention.
So … we created a parallel production environment, informally known as “the boneyard”, where we migrated all the apps that couldn’t meet the architectural or automation standards of “real” production. This was supposed to be a temporary thing.
Needless to say, the boneyard quickly dwarfed the size of all our other environments. After the first six months or so, I stopped hearing talk about “hands-off” deployments. We never did get there.
All of that is just warmup for our topic of the day: this magisterial paper by Clare Liguori, laying out Amazon’s process for continuous deployments.
One of the absolute highlights of my engineering career was sitting across from AWS’s Deepak Singh at a dinner a few years ago and learning about this process for the first time. To us mortals, scratching away at our grubby little pipelines, it sounds like science fiction.
True hands-off deployments, at that scale, with that many interdependencies? Not only does it sound impossible, it sounds inadvisable. Just about every dev shop I’ve ever worked with would blow up their own fingers if they tried it.
But then you step through Clare’s paper and realize that the Amazon deployment machine, beneath all the steampunk awesomeness of its interlocking parts, is built on almost blandly straightforward primitives.
They got code review right. They got build automation right (standardized, yet flexible). Then they figured out how to automate code promotion between environments, and trust the results. None of these things are necessarily all that sexy on their own. Amazon can build their way up to the science fiction stuff — the multilayered integration tests between interdependent services, the rolling global deployments — because of their scrupulous, relentless mastery of the basics.
On the opposite end of the maturity spectrum, I can’t count how many times I’ve sat in a meeting with a team that wants to add “feature flags” or “blue/green deployments” to their “CI/CD strategy”. They say this like it’s as easy as adding the 75 cent cheese to a burger at the takeout window.
And yet they can’t reliably get code from a developer’s laptop into production without three manual interventions, scheduled downtime, and a political coup. That’s not CI/CD, it’s undocumented immigration.
There’s no point in adding cheese if you can’t grill the burger to a safe temperature. And if you can’t get your infrastructure defined in code, if you can’t automate basic smoke tests, if you can’t release to prod without convening a leadership summit, then you are in no place to dream about canary deployments and automated rollback.
“But feature flags are ‘best practice’.” Okay, but so is a reliable pipeline for shipping the config updates to your feature flags. And automation for building and updating that pipeline. And appropriate code review for config changes. Do you have the discipline to practice that?
If you’re consuming a steady diet of DevOps Days talks and the Google SRE book, it’s easy to forget that complex deployment orchestration schemes are rollercoasters you have to work your way up to. Feature flags, like Six Flags, require you to be at least this tall to ride. Otherwise you slip through the harness and plummet eighty feet to your death. Or worse, into corrupted data and indeterminate-state deployments.
The same goes for “serverless”, by the way. Someone reached out to me last week for help implementing a Lambda-backed architecture at their startup. They were expressing a lot of frustration at how difficult, slow, counterintuitive the development process was — not at all what they’d been led to believe.
Didn’t take long to discover they were wrangling everything in the AWS console. I’d have been frustrated too. The value prop of serverless (and cloud in general) makes no sense unless you can automate your infrastructure deployments. Otherwise, you’re trying to herd cattle by the rules of a petting zoo. You’ll get trampled.
The wrong takeaway from Clare’s wonderful paper would be something like “what we are missing, here at Continuous Disintegration Inc, is a cutting-edge implementation of ‘bake time’.” That’s like trying to melt cheese on a raw burger.
The right lesson would be: Even Amazon obsesses over the basics. Source control, infrastructure as code, automated tests. These are the meat in your sandwich of CI/CD greatness. Get them right. The fancy stuff will follow.
Links and Events
What to say about the Cloud Resume Challenge at this point? It’s become one of the most fun, exhausting, rewarding things I’ve ever had the privilege of doing. And I’m especially grateful for the industry veterans who have stepped into the Discord server as mentors.
By request from a number of the participants (and, a little bit, to preserve my sanity) we are ending the “official” portion of the challenge, the part with code review and direct networking help, after a full #100DaysofCloud on July 31. After that point, everybody will be able to make their code public for portfolio purposes. And I’ll get working on the next version of the challenge, because this is clearly something worth continuing.
In the meantime, if you’re still on the fence, now’s the time to get started…
In conference news, I’m speaking (virtually) at OSCON on July 16th; you can attend my highly colorful “Cartoonist’s Guide to the Cloud” with an O’Reilly membership.
Still ticking along at A Cloud Guru with various posts:
And The Read Aloud Cloud, the most bizarre introduction to cloud ever written for engineers of all ages, is now available to preorder from all your favorite bookstores. (And also Walmart.) I’ll be doing some very fun giveaways soon for those who preorder … watch this space.
Just For Fun
In some previous draft of this post, I included an embed of my “S3 Ballad” song, though I’ve shared it with you before. Despite my many clicks, Substack is not letting me delete it. So, you are just going to have to put up with it again. I hope that’s okay.