Charity Majors (Co-Founder/CTO of Honeycomb) on reluctantly writing a database, Honeycomb’s strategy for hiring, and why you should deploy immediately after writing code.
Developer Den is a series of interviews with notable developers in our community to learn more about their journey into engineering. We sat down with Charity Majors, co-founder/CTO of Honeycomb.
How did you first get interested in computers?
I hung out in the computer lab in my college basement because it was the only place that had air conditioning and because I had a crush on a boy who was there. Soon the boy was no longer a thing, but I stayed in the lab all summer, playing a lot of XPilot. I'd hit `tab` in the terminal and see what all of the commands did. It was fascinating — I've always really liked words. I don't think I ever would've gotten into computers if I'd seen a graphical user interface. I really enjoyed learning the command line and shell scripting.
Then, people started to pay me money to do it. I was a music major, and I figured out pretty quickly that all the people I was going to college with were in their thirties, flat broke, and still hanging around the music department. I decided to switch keyboards.
Do you remember an early program that you worked on as you were learning?
Yes! Here's one bratty program I wrote. Everyone in the lab was playing XPilot and using XScreenSaver. I wrote a wrapper for XScreenSaver to capture people's passwords. It'd return a fake error the first time, and then pass the password off to the XScreenSaver binary the second time. I got in trouble for that, but I didn't do anything that the permissions didn't allow me to!
At Honeycomb, what was the hardest technical problem you encountered?
In the first couple of years, the hardest problem was the storage engine. My co-founder Christine and I did not want to write a database. I've spent my entire career telling people — never write a database! But we surveyed everything that was out there, and there wasn't anything that made the right trade‑offs. We needed a columnar store with some particular properties. It's core to the idea of observability that you're able to slice and dice your data and ask any question without having to predict in advance what kind of questions are allowed. It's also core to observability that you can add dimensions whenever you want, and no databases at the time worked with those constraints. Writing that storage engine was non-trivial. A few years later, we replaced that engine with AWS Lambda functions and S3 files, which is pretty dope!
The most challenging technical problems in the last year have been around user interactions. Developer tools in general are notorious for having miserable and patchy user interfaces. I'm too old to memorize 50,000 key bindings! When you're trying to understand a vast and complex system, you need the tool to empower you, to get out of your way, to help you be amazing. Kathy Sierra has this great book called Badass, which states it clearly: your job is to make your users badass. You want as much of your users' attention as possible to be focused on the actual problem. That's easy to say but very challenging to do.
Also, there's so much about the particular problem of observability that needs to be solved socially. When I'm developing a service, I know it intimately — every bit of it, all the trade-offs that I've made. But when I have to traverse the entire system, I don't know the other parts in the same way. I don't know anything about other services, or about work I did more than six months ago. I want to be able to ask the system how an expert would interact with it: what dimensions are important? What typically breaks over there? You need to be able to ask all those gnarly questions that no machine learning model, no AI, will ever answer as well as asking the person next to you.
Honeycomb has been very effective at coming up with new language to describe what you're doing. How did you decide to talk about observability separately from monitoring?
When we started the company, we had given no thought to what we were building, and for who. We just knew that we had had a problem, and we knew how to solve it. For the first six months, there was a lot of floundering. We started in January of 2016, and by early July, we needed a way to describe what we were doing.
I googled the term observability — it wasn't a term in the common parlance. The definition comes from mechanical engineering, from control systems theory. It's about how you can understand the inner workings of your system by looking at it from the outside. A lightbulb went off: that’s what we’re trying to do! We're trying to build systems tooling where you don't have to know in advance what numbers will be important, where you don't have to predict what's going to break — you can just interrogate it from the outside and understand any system state.
Systems used to be much more predictable. You only had the app server, and when it failed, you could attach a debugger and step through it. Usually, your app server failed because the load balancer had done something, or because MySQL ran out of connections. Your app would fail in predictable ways, so you could make a dashboard to represent the ways it could fail.
Nowadays, our distributed systems are something radically different. There's a huge gulf between the way you interrogate a distributed system versus what you'd do in the old days. You're still looking at graphs to understand your services, but the actual work is something completely different. We needed language to reflect that, so we started using "observability." It took years of work for that to catch on — but once it did, of course it only took about three weeks before everyone else was using it too.
It really has been a change of paradigm. Almost everywhere you go, the person best at debugging is the person who's been there the longest — they've got the most experience with the system and the most scar tissue. Now, I've been on three teams where that wasn't true. The best debugger was the person who had the most curiosity and the most tenacity. They might be the newest person on the team, but since they have observability tooling rather than monitoring tooling, they're able to explore and ask important questions.
Are there certain things that Honeycomb does that you think every company should adopt?
Absolutely! We haven't gotten everything right the first time, but we have been intentional about trying to fail in different ways.
Here's a thing we do that everyone should do: if there isn't a three-day weekend in a month, we make one. Everybody deserves a three-day weekend, and most people won't take that time for themselves unless the whole team does.
We've also put a lot of thought into our hiring and recruiting. We still don't have a recruiter, and we've hired about 150 people. Instead, we talk openly about our philosophy, and that attracts a lot of people to apply.
We think a lot about how 🤬ed most hiring practices are, and how to avoid that. Interviews are so often about trying to catch you in an error, or trying to make you make a mistake — they're very oppositional. That's stupid! It's not like we're interviewing you for a position where you'll be jumping out of airplanes. We don't care how you perform under pressure — we want to see people at their best, and we want people to feel they've shown us their best.
We ask some coding questions in the interview, but the important part for us is the code review. It's less important what someone writes down, and more important how they talk through what they considered and what the trade-offs were. That signals to us that we've found someone who will be consistently high-performing because they can communicate and learn and make mistakes with their team.
We're much more transparent than any company I've ever worked at — maybe transparent to a fault. After every board meeting, we show the board deck to the team. We're transparent with our team about our finances. I was so tired of working for places where I was taken by surprise, or where my input didn't matter, or where it was clear where I wasn't trusted with information. 🤬 that! People join a company of our size because they're interested in building something from the ground up. Our job as managers is to give people autonomy and to make sure they're able to master skills and have an impact. Because that will make them happy, and make them want to stay.
Recently you've been writing about CI and CD. What do you think is the first step for a company that has long deploy times?
Once you've written code, try to get the code live in production in as short a time as possible, ideally under 15 minutes. It's important that the deploy process is both predictable and short, because the goal here is to hook into your muscle memory. It's so much easier to maintain your already-existing focus than it is to write up a bunch of checklists after the fact. When you've just written some code, you know why you're making the trade-offs you've chosen, you know everything about it. As soon as you move your focus elsewhere, you'll lose that, and you can never get it back again.
If your deploy time starts to extend to hours, you're starting to batch people's changes together. If your changes are live in 10 minutes, then you can look at the live behavior through the lens of the code you just wrote, and be able to tell if it's working as intended. But if your changes will go live within the next few days with the changes of one to 10 other engineers, you're never going to look at that. You're not going to hold yourself personally responsible for any defects in that deploy — and that makes sense, I couldn't hold anyone responsible for that.
So it's very important that the changes in a deploy come from one engineer, and that they appear in production in as short a time as possible. There's more to it than that, but that's the start — everything flows down from that. If you mess that up, nothing downstream will ever get better.
You can spend your life chasing the pathologies and side effects that come from having a long lead time, or you can fix it at the source and everything will be easier.
Are there processes in CD that take too long, but have value?
Yes, absolutely. There are a lot of them, like regression testing — but they don't have to be in the critical path.
It's important to be shipping this code behind feature flags. I'm not saying to release to users constantly! This is about decoupling the release from the deploy. You should get your code into production as quickly as possible. Things like manual testing can be very valuable, but they don't need to be blocking deploy.
What do you think library authors like Oso should be doing to get the same effect?
There are a bunch of cases where people don't think this applies to them: IoT, mobile, medical devices, libraries, et cetera. It's true their situation is a little different. In those cases, understand the purpose behind quick deploys and try to get the same benefits through another process. Instead of thinking, "it's not my problem," realize the core thing is still there — you need to find bugs as early and with as much context as possible. It gets exponentially more expensive to catch bugs and fix them when it's been a long time since you wrote the code. For mobile, for instance, the best practice is to have an internal build and continually deploy it. Then the internal users will be using it immediately, it's live, it's in someone's hands.
I love the quote, "Software that hasn't been deployed ages like fine milk." It gets a smell really fast. Or think of it as a transplant — you need blood flowing through those veins as quickly as possible. Get your code out, and get that blood flowing.
What are you working on outside of Honeycomb?
I've been painting! I've been working on the walls of my office. In my house, the entryway is blue, the bathroom is purple, the kitchen is orange, the living room is yellow, the bedroom is pink, and the library is red. I had to have an office that was green, right?
Charity shows a picture of a rippling, swirling mural in all shades of green.
It goes all the way up to the corner and around. I'm living alone for the first time in years, so I'm going hog wild.
What have you been reading recently?
I've been reading a bunch of books about ADHD and emotional regulation, and a book about witches in Salem. I've been reading some books about organizational health, like by Patrick Lencioni, who wrote The Five Dysfunctions of the Team.
Here's a book that I strongly recommend. Don't judge me when I tell you the title. It's called Chaos: Charles Manson, the CIA, and the Secret History of the Sixties. The author's not a conspiracy theorist. He demonstrates pretty conclusively that the story of the Mansons you've heard, the helter-skelter story, did not happen. It's fascinating because he's been researching this for so long — this started out twenty years ago as a magazine article he was writing, and he kept researching it and going down one rabbit hole after another, and at each step there are a ton of unanswered questions. He's just so frustrated — he's a serious writer and this sounds like a conspiracy thing, but it keeps not adding up, and he can't stop researching it. I read it all in one sitting, five hours cover-to-cover. I cannot recommend it enough.
Do you have any advice you'd give to a developer early in their career?
Going from junior to senior is really about skills acquisition. After that, it's about your life quest! Learn to lean into pain, learn to not chase the feeling of being safe and comfortable. Learn to love the feeling of "I don't know this, but this is interesting!" because that will set you up for a long glorious career.
More broadly than career advice, is there any advice you'd give someone who's trying to get to where you are?
Oh god — this was a series of accidents. I have no idea how I got here. That's my advice.
That's a great answer. Thank you so much!
If you enjoyed this interview we encourage you to share it, and tag @mipsytipsy and @osohq! We'd also love to hear from you on who you think who we should interview next, tweet at us or join us and hundreds of developers on slack!