Billing Panel - Operational
Billing Panel
DirectAdmin Panel - Operational
IMAP - Operational
IMAP
SMTP - Operational
SMTP
Webmail - Operational
Webmail
Support - Degraded performance
Support
Notice history
Jan 2026
- ResolvedResolved
Maintenance complete.
- IdentifiedIdentified
Our billing portal (accounts.mxroute.com) is currently on maintenance status. We are preparing a major service upgrade. This shouldn't take too long. No services are down.
Dec 2025
- PostmortemPostmortem
Hey friends,
The sunfire.mxrouting.net server went down due to a disk failure. There was really only one new lesson learned from this incident, how to fix the backup software when it refuses to restore under the latest version of DirectAdmin. Every other lesson involved here had already been learned and already applied elsewhere across the fleet.
This server was built around an idea, and that idea failed to fully account for all of the relevant factors. None of this information is new, and this outage represents the final fallout from that design decision. While the issue is resolved and the lessons have already been applied, I want to share the full story for anyone who wants the context.
It has not been a secret that we grew too fast, largely by unintentionally appealing to users who were not system administrators. Regular end users came aboard in large numbers. That increased support overhead and development requirements at the same time, all against a business plan that was not designed to sustain that kind of growth.
I took that as a challenge. In many areas, I rose to it. In some areas, I did not. The Sunfire server falls into the latter category.
As support overhead increased, we needed to build new systems to push that overhead back down. A better user experience reduces support load, which allows the original business plan to function without raising prices for existing customers. The tradeoff was that resources were stretched very thin for a long time.
Every part of MXroute suffered to some extent. The only reason we did not pivot to a more traditional and less customer friendly business model was because we chose to endure that stretch while working toward UX improvements. The goal was to reduce support overhead first, then redistribute resources back to where they were originally intended.
We knew Sunfire needed a hardware migration. That was not a surprise. It was delayed as we approached the end of major UX work. I played with fire and I got burned. The failure was not unexpected, and there was a plan in place. What you saw today was the execution of that plan, with a few extra unintended steps along the way.
Now that UX improvements are in place, with more still coming, support overhead is coming down. Resources can be delegated properly again. The next server at any meaningful risk is Shadow, but that risk is significantly lower. Shadow will be migrated in production with the intention of preventing downtime altogether.
New server builds, and even older ones, already account for these scenarios. They do not carry the same risks that Sunfire did. That is why I say there was no major architectural lesson learned today. The problem itself was already solved. What remained was cleanup, and I hoped to complete that cleanup before it turned into an outage. I did not.
Some might say, "Jarland, we see you online every day. How could you not have the resources to migrate Sunfire earlier?"
Look, I post on social media while I shower. I cannot migrate servers while I shower. If I give the impression that I am not stretched thin, that just means I hide it well. I have been stretched thin, and now we are fixing that.
The way things have been is not the way they will always be. The light at the end of the tunnel is no longer theoretical. The worst of the accumulated messes is behind us. On to the next thing.
- ResolvedResolved
We are considering this resolved. There are a few tiny little things to deal with on a per-user basis, with just a few users. This isn't reason enough to continue having a status open that implies to users that they should expect a problem with this server.
- UpdateUpdate
We still need to verify a few things and complete a few tasks on this server, but we now consider users on the server to be fully online, with all basic services functional. Crossbox is still down temporarily, as the only way to quickly bring it up is to remove all customer data from it, and we believe extended downtime of one webmail option to be preferable to that. Until every little piece is done we're moving this from "outage" to "degraded performance" on the status page.
- UpdateUpdate
Updates will be provided less often for a few hours. The hope is that in a few hours the next update is "Everyone has been online for a while now." At this point it's just watching that the backup software continues to function as expected.
- UpdateUpdate
Roughly 50 customers remain to be taken care of. They should be on the path to resolution, provided the backup software doesn't make up a new reason to fail.
- UpdateUpdate
Roughly 100 customers are left, and the reason they haven't been restored yet appears to be bugs in the backup software. The data is there, but I need their processes.
- UpdateUpdate
While the majority of customers are back online, there are still a few left to restore. I get it, it wasn't supposed to take this long. That's why we haven't built a server like the one that failed in 5 years and never will again. Over the years we've made variations in hardware configurations, and Sunfire was a bad decision. Only thing left to do is finish this and never experience it again. Postmortem will follow, of course.
- UpdateUpdate
More and more customers back online. Still work to do.
- UpdateUpdate
Work continues. More users are online than are not. But we're still going. A handful of giant accounts slowed down the restore just a bit in the last hour. Remember these things are not working yet for anyone:
Crossbox
Toggle Expert Spam Filtering
Retrieve Domain Verification Key
SpamAssassin (temporarily intentional, but it's not our only filtering layer)
- UpdateUpdate
A large portion of users are back online. We're continuing to restore the rest, and working to speed it up a bit more. These items do not currently work on the Sunfire server:
Crossbox
Toggle Expert Spam Filtering
Retrieve Domain Verification Key
- UpdateUpdate
We are now processing inbound and outbound mail for users that have been restored, as your domain appears on the list of restored accounts. So users that have been restored should have all functionality except for Crossbox. This reduces the number of impacted users continually as restores occur, which are continuing at this time.
- UpdateUpdate
Backups still restoring. Pacing varies, larger accounts slow it down but we have 10 simultaneous restores going. It's worth noting that accounts are restored before emails, so your login may work just a bit before your email reappears.
- UpdateUpdate
Backup restores continue moving forward. The pace is now reasonable. Over 100 users are back online. However, we are still not accepting inbound or outbound email on the server until the restores are completed. This is intentional and designed specifically to ensure that you do not miss email, better it arrive late than never.
- UpdateUpdate
We've made some adjustments to speed up backup restores a bit.
- UpdateUpdate
DNS for sunfire.mxrouting.net has been restored. You may or may not have success in logging into the server as accounts are still being restored. Some configuration deployments need to wait until after the backup restores finish. The server is also not accepting ANY inbound email until this is over, and outbound may not quite work yet either. We are deferring it all so that it will be re-sent later. We can't let anyone miss any inbound email, so it's important that the exim configurations be properly populated before we accept inbound mail.
So to clarify where we're at:
You should see "sunfire.mxrouting.net" appear as up very soon.
Your account may not be restored to the server yet (as of this moment, it likely isn't).
Inbound mail is intentionally deferred right now.
Outbound mail may or may not work right now.
- UpdateUpdate
We're cooking with gas now. It's all downhill from here. A disconnect between two software vendors delayed us getting to this stage, but what's done is done.
- UpdateUpdate
We are continuing to work on restoring the backups so that the new server can slip right in and take the place of the old one. There should be no expectation of data loss, nor should there be expectation of lost email. Delayed email, of course.
- UpdateUpdate
Working out the final details for restoring backups to the new server. They are restored over network, but the storage is local to the network so it should be quite efficient.
- IdentifiedIdentified
The sunfire.mxrouting.net server has experienced hardware failure. We are in the process of replacing the server. During this time users on the sunfire.mxrouting.net server will be unable to access their email during this process. While this isn't a situation that we often find ourselves in and we hoped to prevent, it is a possibility and there is no immediate resolution but the one we are working on. If you are not on the sunfire.mxrouting.net server, do not consider this status message to have any connection to your MXroute service, please continue to reach out to us about other servers on our platform if needed.
- InvestigatingInvestigatingWe are currently investigating this incident.
- ResolvedResolved
Today users experienced a problem connecting over IMAP on the fusion.mxrouting.net server. Upon initial observation, everything looked fine, there were no clear errors being written anywhere and the services were all online and responsive. Upon receiving a few reports, we restarted Dovecot to mitigate the issue while starting the investigation. Users immediately reported clear skies, but the digging had to begin.
It was found that inotify max_user_instances is far too low on AlmaLinux 9 for this kind of server environment. However, since that wasn't an issue faced on previous servers it wasn't one that we were looking for, and it wasn't going to reveal itself until the server had been in production for a bit. It also caused redis to fork bomb the server a bit, leading to over 5600 redis instances open, almost all of which were opened for only one user. We've increased this and we now expect zero issues.
Nov 2025
- ResolvedResolvedThis incident has been resolved.
- InvestigatingInvestigating
The blizzard.mxrouting.net server is being rebooted for emergency maintenance.
![[object Object]](/_next/image?url=https%3A%2F%2Finstatus.com%2Fuser-content%2Fv1670983483%2Fdc9swwk4so7j7ipr1ijy.png&w=3840&q=75)