I just wanted to let you know about our outage we had this morning. It went from 7:23am to 7:48am AEDT.
As many customers would be aware, in the last few months we had been suffering from performance issues and had too many slowness events or outages. These have been fixed completely. I will share more on this in a separate post.
This mornings issue was that we deployed an update to Cliniko with code that locked our database. This was obviously a mistake. We have a guide for our developers to follow on how to make changes to the database in a safe way, however clearly the guide is not enough. This is my fault, I need to make sure better systems are in place for actions that can have such big ramifications.
We are now putting in automations that will prevent us from doing so again. Our automated testing will catch this kind of code before it ever gets to go live.
To be clear, it is not a systemic issue we are still facing like we had previously with server loads and performance, this was an isolated incident that we have now have measures to prevent it reoccurring.
I am really sorry for the trouble this caused people this morning. I know how much you rely on Cliniko, I can only ensure we don’t make the same mistake twice.