On a catastrophic scale of 1 to 10, the Heartbleed OpenSSL bug is an 11 says cryptography guru Bruce Schneier. Being the biggest security breach the Internet has ever faced, I hope it’s also the biggest I’ll ever see in my entire life.
The exact financial, technological and human consequences of Heartbleed will probably never be known, but 2 days after the issue was revealed, there are already a few lessons we can learn.
There are at least 4 of them.
1. No last minute feature push. Never.
Quoting Netcraft (via Fabrice Bachella) on Half a million widely trusted websites vulnerable to Heartbleed bug,
Support for heartbeats was added to OpenSSL 1.0.1 (released in 2012) by Robin Seggelmann, who also coauthored the Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) Heartbeat Extension RFC. The new code was committed to OpenSSL’s git repository just before midnight on new year’s eve 2011.
If you’ve ever wondered what was the worst time to push a new, critical feature on a product, new year’s eve is what you’re looking for. Actually, pushing or releasing a new feature during the whole holiday season is a very bad idea.
No one will care about it. Period. No one else but your peer reviewer (if you have any) will see your code. I don’t think pushing TLS heartbeat extension on another date would have prevented the Heartbleed bug, but more people would have read the code, and hence maybe noticed the problem.
And if you break something, no one will be there to support your users, fix the bug, push a new release etc…
Applied to a startup, it’s simple: if you’re about to release something, and someone has to push last minute code, postpone the release. Either your feature won’t be as good as it should be, or your code will break at some unexpected place, or you’ll have to rollback because it breaks. Trust me, I’ve seen hundreds examples of last minute code push, hopefully none with effects as big as Heartbleed.
2. Monoculture kills
66% of the servers connected to the Internet use OpenSSL to provide https. This, like every kind of monoculture, is a real problem for many reasons.
Monoculture is a danger for security. It means security researchers will all focus on the same target instead of working on many. This was true in the late 80’s and 90’s when MS DOS and Windows were sharing 95% of the desktop market share: viruses coders were only targeting the platforms as they were easy to exploit, and most people were using them. It’s still true today with Wordpress powering 12% of the sites worldwide, and using third party poorly coded if not willingly infected themes and plugins.
Monoculture is a danger for innovation. Innovation comes from diversity and the will to explore new, undiscovered paths. On the contrary, monoculture brings the “everyone does it this way, so we’re doing it too” syndrome. I remember, 10 years ago when we were developing Web sites and applications for Internet Explorer 6 only. It was a real pain as alternate browsers were supporting new feature faster. For the record, Internet Explorer 6 was discontinued only 2 years ago.
Monoculture is not sustainable. OpenSSL is a free software, but imagine what would happen if 66% of the https market share was controlled by a single company? A quasi monopoly, you name it, with all the underlying problems, like some countries not being able to access the basic crypto feature. Imagine what can happen when monoculture comes farming: a whole regions or a whole country starts producing the same crop because the soil is good and the external demand is high. If for some reasons the prices fall or the weather is bad, the whole country can go to bankruptcy and become unable to buy food for its inhabitants.
As a startup, you’re both experimenting and need to move very fast. Moving fast means (but not only) being able to pick up a solution and change if it doesn’t work, doesn’t scale or does not do the job. This is not compatible with monoculture, and it can even kill you faster than you’ll ever imagine.
For this very issue, Brad Fitzpatrick was not using OpenSSL but the Go language SSL/TLS library. By choosing to bypass the usual Linux + Apache / Nginx + OpenSSL stack, he was able to avoid being vulnerable to Heartbleed.
Avoiding monoculture is not about reinventing the wheel
because the existing sucks, it’s about using the right tool at the right place, eventually deciding to build something from scratch because the existing does not fit the needs.
3. Peer reviews are not enough, but you shouldn’t live without them
Errare humanum est, and peer reviews help makes less mistakes, but they’re not enough. OpenSSL heartbeat support code was reviewed by one of OpenSSL official maintainer, but it did not prevent the bug to be pushed anyway.
Peer review is a very powerful process that should be applied at every level of any company, for everything that should go in production: code, indeed, commercial stuff, marketing, communication… In a peer review, you get someone to review, comment and (in)validate your work, on a peer to peer basis. This means the peer review gets rid of the hierarchical levels to focus on self improvement. While the reviewer does not validate the work, it’s not considered as ready to ship. As many peers can be involved in a single peer reviews if needed.
Peer reviews allow to:
- avoid mistakes (but not all of them)
- improve the whole company knowledge of what’s going on (in small companies)
- have everyone to improve.
4. Automated tests are not an option
What’s been striking me when reading Robin Seggelmann commit is the total lack of automated tests.
One of the first things I’ve ever heard from a startup founder back in 1998 was about how making mistakes was OK as long as you don’t repeat the same mistake twice. Some people still think automated tests are useless and peer reviews a loss of time, so do some companies, because manual testing ahead of a release should be enough.
Automated tests are not made to avoid making mistakes. Automated tests are made to avoid making the same mistake twice. Indeed, every piece of code should be pushed with tests to valide it, but that’s not the most important. For every bug happening in an application life should come an automated test ensuring that issue will never happen again.
If you don’t see the point, see the tests as a knowledge management tool. A knowledge management tool, even an enterprise social network allows you to track every business issue you’ll ever face, evert complicated negotiation you face so you don’t need to face them twice.
When planning a release, every company should include the time to write automated tests. It should allow time to write automated tests for upcoming feature, and for upcoming bugs as well.
There are probably more lessons to be learnt from Heartbleed catastrophe, but these are the most obvious. I hoped you enjoyed reading this article despite my not yet perfect English style. I’m continuously working on improving it, asking American friends to proofreed my texts when they can, and writing down my most frequent mistakes so I won’t repeat them.