More talks in the program:
14:15 - 15:00
When production issues happen, two things make them hard to deal with – people have to context switch, and we have to diagnose before we can fix. I want to suggest a paradoxical way of improving response times and reducing downtime, by increasing the number of issues which we scramble to fix, and by removing formal and informal restrictions on how much time should be spent on them.TDD encourages us to write the minimal test that fails, and then write the minimal code which makes the test pass. Let’s observe the behavior of our production ‘machine’ until we discover the smallest error or warning and then do something simple to mitigate it. Let’s replace support rotas by ‘support queues’, where the next person takes the first issue and works it to completion. Let’s continuously add observability until we can answer even obscure questions about the working system. And let’s get used to ‘quiet watching’ of the system so that we can discover anomalous behavior long before we experience failures.