March Update

Hello everyone! I hope 2025 is treating you well, or that you are at least surviving. But I have good news, the newsletter is here!
Read on for updates on the book front, some things you might want to read, and details about forthcoming events I'll be attending.
Book Progress
I had this update all ready to go earlier February, but held off till I could share with you an exclusive look at the new cover for Building Resilient Distributed Systems.
The animal covers of O'Reilly books are iconic, and I'm really happy with the choice of a European badger for the cover of mine. I currently live in the countryside in the south east of England, and badgers are very common around here. So it's nice to have a slice of home on the front cover.
The chapter on Thundering Herds is now available for those of you reading the early access version. This chapter focuses on the different types of unexpected load, how they can bring systems down, and what you can do about them.
Since then I've been focusing on how to cover dealing with resiliency at scale. This has unearthed a bunch of smaller chapters around topics such as message brokers, progressive collapse (more on that later in this post) and stateful processing. Once I've got a better sense of how this changes the table of contents I'll share it here. The chapter on message brokers will be the next to get released, and I'd hope you'll get to see that in April if not before.
Things Of Interest
A mix of things I turned up for book research and interesting news pieces:
- Progressive Collapse. Discovering this term during my research has caused me to re-examine some of how I frame the middle section of the book, in a good way. A concept from structural engineering, Progressive collapse describes a situation where a small failure causes a much larger failure in the wider system. Examples I'm using in the book to illustrate this concept include the Ronan Point and Francis Scott Key Bridge collapses from 1968 and 2024 respectively.
- A post-mortem and commentary related to a recent outage at Canva. An interesting example mis-configuration causing a thundering herd. I may well revisit my thundering herd chapter where I discuss the cache stampedes to reference this case study.
- A great writeup around Monzo's use of a dedicated stand-by system to ensure that core banking operations continue to operate even if the primary system fails. Rather than being a second copy of the primary, the stand-by is a clean-room implementation of the core banking functionality, sharing no code, and also runs on a different cloud vendor. This will definitely be getting written up in more detail for my book, as it's a fascinating case study into the importance of context and trade-offs. Props to Monzo for once again sharing some of their technical insights.
- Infrastructure As Code. Kief's 3rd edition of his book is now in production, and should be available soon. As technologies come and go, principles remain. So whilst the core ideas of the book haven't changed, the detail of how they get implemented have shifted. So it's great to see Kief keeping this updated.
Upcoming Events
- QCon London 2025 April 7-10. Held at the conference venue I've spent the most time at, and one of my favourite venues at that, just across from the palace of Westminster. I'm speaking on Daniel Bryant's API track, where I'll be talking about timeouts, retries and idempotency. The rest of the lineup looks great, and if you want to pick up a ticket you can use the discount code SamuelNQUK2560 for some money off.
- Trifork Masterclass Amsterdam May 7-8. I'll be back at Trifork's offices once again this year, delivering a two day class for all things microservices.
- Craft Conf 2025. I'm very fortunate to be heading back to Craft, which consistently has an excellent lineup and a very distinct vibe, due in large part to the venue being a railway museum! The schedule isn't confirmed yet, but I plan to do a talk around resiliency in some way, shape or form.
Ask Me Anything
Finally, I'm going to be putting out some sort-form content - either for this newsletter, or as videos or something else - where I give quick answers to questions you might have about microservices, cloud, continuous delivery or resiliency. So if you have a question you want me to answer, let me know!