Weeknotes: 9th December 2019
This week I took a look at what we achieved this year.
I’ve had a busy week. On Monday evening we took the plunge and switched a huge swathe of our users across to the new permissions model we’ve been working on. Wednesday was marked by an outage, Thursday was our Christmas celebration, on Friday we finally traced some issues we’ve known about for months (years?).
It all felt like a lot was going on, particularly given the time of year. It was a scary, tense, stressful week. But I’ve been keen to look at the bright side: how far we’ve come.
In the last few weeks, Complete the deputy report has felt particularly unstable. We’ve had a couple of minor outages, and we’ve come across boat loads of errors. However, I argue that this is good news because earlier in the year those errors were already around and we just weren’t aware of them.
My overriding theme for the last quarter of the year has been improving our monitoring and dealing with the problems that come out of it. These are issues our users have seen every day, but we weren’t aware of. We now get errors when they happen, letting us be proactive rather than waiting for a bug report.
Similarly, our outage on Wednesday. Without wanting to go into much detail, it was one of those “these things happen” issues that can’t really be predicted. The good news though is that we resolved everything within 15 minutes, and were already fixing it before users noticed.
This is all a bit of a Catch 22 though (or something similar): if we hadn’t made the improvements and weren’t aware of these problems, then we wouldn’t be as concerned with them. Instead, we know about everything that’s wrong and collectively have an uncomfortable view that things have gotten worse.
So, as we approach the end of the year, it’s good to look at what we’ve achieved. Monitoring is better; infrastructure is more reliable; we understand our system better; testing has better coverage; we’ve fixed tonnes of bugs, many are 4+ years old; CI is effective.
Next week I’ll take a early shot at some 2020 goals, but I truly wouldn’t be able to without the great work of 2019. We’re beyond “fixing the basics” here and into the serious territory of building a great system. I think our main problem now is having too many ideas.
- Switched almost all non-lay deputies to the new permissions model
- A few “unusual” organisations remain on the old system for now, whilst we ensure we understand their needs
- This crosses off a ticket that’s been in development for months, and sprint goal from the past 4 sprints.
- Reviewed the cases we can’t parse and set out plans to resolve them
- We’ve been aware of these for at least a year, but unable to do something about it. Recent code changes mean we’re now close to 100% coverage. The last remaining big goal is supporting lay and non-lay deputies on the same case.
- Restructured our preproduction environment for more logical separation of concerns
- Resolved a migration issue which was causing unsuccessful deployments
- Resolved a migration issue which was causing tests to be unreliable
- Celebrated the festive season as a department with a treasure hunt and curling.