Weeknotes: 4th November 2019
This week I went to Ireland.
Not on a work trip, so it’s admittedly not that relevant, but it took two days out of my week so was a pretty key to me.
I came back having missed retro and planning on Tuesday, which is a bit of a weird position to be in: I’m doing all 13 days if the sprint but had no input into what went in.
I was broken back in quickly on Wednesday when we had a production-level outage at 10am. Investigation since has revealed that our Redis node fell offline because of a hardware failure. We were back up within ten seconds, but users were still impacted and that sucks.
A lot of the rest of the week was taken up with improving our error handling so we’re not at as much risk in the future. We’ll also be building in redundancy for Redis so we don’t have a single point of failure.
- Deployed the full review checklist (see last week’s weeknotes) for all report types
- Re-established role hierarchy in organisations
- Provided better error pages during critical failures
- Installed PHPStan to highlight coding errors
- Investigated our vulnerability to some recent Nginx/PHP CVEs (we’re fine!)