chocobo.mxrouting.net IMAP Postmortem
DegradedReported May 1, 2026 at 5:21 AM UTC · Resolved May 1, 2026 at 5:22 AM UTC · Duration: 0m
Affected Systems
This status page is only being posted so that a postmortem can be posted under it. The time of this status is not directly relevant to the issue, as the issue was fixed immediately upon initial diagnosis and since then every waking moment has been spent analyzing logs to explain the event. For this reason, the status page was treated as a lower priority for the issue.
Postmortem
It started with users on the Chocobo server complaining about Dovecot not loading new SNI certs for custom IMAP subdomains (mail.yourdomain.tld, etc). I could not locate a reason for it, but that's also the job of DirectAdmin on the backend and while I'm working to replace DA, I'm not there yet. So my visibility into some of what it does is relatively limited. I mitigated this by creating a job to reload Dovecot config every 60 minutes while I eventually found a solution that could be unique to our systems and perhaps totally mitigate any DirectAdmin issue in that area. Well, DirectAdmin successfully triggered a Dovecot config reload at the exact same time that my reload job kicked off. This triggered a E2BIG error, and caused a domino effect for about 75 minutes. I mitigated this by halting my Dovecot reload job, maybe DA is working properly again and I'll test that next. If it's reload is functioning but not pulling in new SNI certs, that'll be a fun thing to figure out (why reload1 fails and reload2 succeeds). Either way, I'll work on a smarter method of mitigating both problems, and have it on my own desk within the next 5-6 hours. I'm deeply sorry to anyone who saw this today. It's an entirely unprecedented failure for us, yet another lesson learned in the effort to get this all perfect. It seems that perfection may be out of reach, but surely something more near to it can be obtained than what was seen today.
Updates
Marking issue as resolved to allow the postmortem.
This status page is only being posted so that a postmortem can be posted under it. The time of this status is not directly relevant to the issue, as the issue was fixed immediately upon initial diagnosis and since then every waking moment has been spent analyzing logs to explain the event. For this reason, the status page was treated as a lower priority for the issue.