All systems operational

Completed
lucy.mxrouting.net

Status
Resolved after 5 days
Started
December 06, 2023 at 3:45 AM
Completed
December 10, 2023 at 10:13 PM
Affects
DirectAdmin Panel
IMAP
SMTP
Webmail
  • Completed
    December 10, 2023 at 10:12 PM
    Completed
    December 10, 2023 at 10:12 PM

    Crossbox is back online on Lucy. Aside from copying over any remaining email data from the backup server, this is considered resolved. Any remaining issues should be handled via support ticket.

  • Update
    December 08, 2023 at 6:55 AM
    In progress
    December 08, 2023 at 6:55 AM

    Lucy is online. Any email users that can't login should be able to in less than 1 hour. Please read the previous update for it's numbered points, they are very relevant. We're beginning the sync of old emails, that part is going to take some time and we'll provide less updates for it on this status page.

  • Update
    December 08, 2023 at 6:09 AM
    In progress
    December 08, 2023 at 6:09 AM

    We're about to set users loose on the Lucy server. Here's what you need to know:

    1. We can't turn on Crossbox tonight, the MySQL versions are incompatible. Use webmail.mxroute.com, lucy.mxrouting.net/webmail, or lucy.mxrouting.net/snappy if you need to use webmail.

    2. Your old emails aren't there. They're NOT GONE! Don't worry. The goal was to get you into your accounts so you can start sending/receiving mail. It's going to take longer to get your old emails back in place. It might take a few days, honestly. That's why we didn't want to wait for that to bring it online.

    3. There might be unexpected oddities. Hopefully they'll be few.

  • Update
    December 08, 2023 at 3:09 AM
    In progress
    December 08, 2023 at 3:09 AM

    As of right now there are 3 Lucy servers to reference:

    1. The dead one (Original Lucy)
    2. The one we're working to restore backups on (New Lucy)
    3. The one we're working on migrating Original Lucy to in an experimental way (Experimental Lucy)

    Either #2 or #3 is going to be the production Lucy server. If Experimental Lucy wins, it'll be online tonight and everyone will simply be missing all of the email they've already previously received, sent, etc. However, that email will reappear slowly as we sync it over. We're rooting for Experimental Lucy, but we're prepared to fallback to New Lucy. Confusing? Surely.

    As of right now Old Lucy is running xfs_repair again so we can mount the FS and prep it for selective rsync to rebuild the server skeleton (accounts, email users, etc) on Experimental Lucy. Currently, the IP for lucy.mxrouting.net points to Experimental Lucy, where we're installing a fresh copy of DirectAdmin in preparation for the rebuild of the skeleton. We have a script running in a loop to ensure a consistent block on port 25 so that when exim is installed, it doesn't start rejecting the inbound email that is waiting for you, and you can actually receive it when we finish the job.

    If Experimental Lucy wins the game, it may run without Crossbox for several more hours, but that's an acceptable temporary loss for having everyone back online tonight. So that's where we're at.

  • Update
    December 08, 2023 at 1:48 AM
    In progress
    December 08, 2023 at 1:48 AM

    An update on the two efforts for restore:

    1. Still working to restore backups to a new server in Germany (as that's where our backups are, same datacenter, fastest connection).

    2. We mount original Lucy's file system in a recovery ISO (as that hasn't yet failed in tests, though it might later), build a new server next to it, and first we rsync enough data to get everyone's accounts back online but NOT their emails. When everyone has their accounts back online, we begin syncing emails. It's a bold strategy, fingers crossed.

  • Update
    December 08, 2023 at 1:31 AM
    In progress
    December 08, 2023 at 1:31 AM

    It's finally time to lay the original Lucy server to rest. It's not coming back. But we're still putting in two efforts to see which one can beat the other in a race:

    1. Restore backups to new server in Germany.
    2. Build a new server next to the original Lucy, mount it's disk in a recovery ISO, and try to migrate it's data to the new server.

    Whichever effort wins the race gets the prize of being the production server.

  • Update
    December 07, 2023 at 8:14 PM
    In progress
    December 07, 2023 at 8:14 PM

    No updates to give from the previous one.

  • Update
    December 07, 2023 at 3:50 PM
    In progress
    December 07, 2023 at 3:50 PM

    More split effort today as the backup restore plans continue, the effort to revive the existing server continues as well. The transfer of backups to the new server is so much slower than imagined, simply because of the number of files (each email is 1 file). But we're still working on it.

  • Update
    December 07, 2023 at 6:28 AM
    In progress
    December 07, 2023 at 6:28 AM

    Still working on restoring backups. It's not that it isn't working, it's that it's a painfully slow process. We have to talk about that problem later though, the first priority is getting this online. While the backup effort is ongoing, we are still trying some radical things to bring the original server back online. It may be a long shot, but we're not leaving any angle of this on the table. Everything must be tested in order to bring the best and fastest resolution.

    Note that the schedule for this status update is not an ETA, at this point it's an arbitrary number chosen to keep the status open.

  • Update
    December 06, 2023 at 11:08 PM
    In progress
    December 06, 2023 at 11:08 PM

    Still working on restoring backups. Planning a full postmortem of this on blog.mxroute.com after the event.

  • Update
    December 06, 2023 at 8:01 PM
    In progress
    December 06, 2023 at 8:01 PM

    As we've continued working on the original Lucy server, we've had ups and downs. At one point we actually got it to ping, but the filesystem quickly went readonly and wouldn't mount again. We're still working on moving backups to a new server, independently of our attempts to repair the original server. An ETA would be irresponsible, there's simply no way to have any kind of estimate right now.

  • Update
    December 06, 2023 at 6:20 PM
    In progress
    December 06, 2023 at 6:20 PM

    Making progress. An ETA could only be isolated to "between 1 and 48 hours" so it's really not worth giving one yet.

  • Update
    December 06, 2023 at 3:09 PM
    In progress
    December 06, 2023 at 3:09 PM

    We are mostly considering the Lucy server to have failed, it's file system to be hosed. We are working on restoring backups. This isn't a fast process.

  • Update
    December 06, 2023 at 8:17 AM
    In progress
    December 06, 2023 at 8:17 AM

    Two efforts have spawned as a result of this maintenance failure.

    1. Working to bring Lucy online
    2. Working to restore backups to a new server

    If I could tell you which one I was more confident in completing first, I would be happy to do so. I cannot.

  • Update
    December 06, 2023 at 7:26 AM
    In progress
    December 06, 2023 at 7:26 AM

    We're still working on this.

  • Update
    December 06, 2023 at 5:51 AM
    In progress
    December 06, 2023 at 5:51 AM

    Still working on this.

  • Update
    December 06, 2023 at 5:05 AM
    In progress
    December 06, 2023 at 5:05 AM

    The server did not boot into it's OS as intended, despite booting fine on last reboot. We're working on it. Everything is fine, no inbound email will be missed, a chassis swap just sometimes comes with unexpected hurdles.

  • In progress
    December 06, 2023 at 3:45 AM
    In progress
    December 06, 2023 at 3:45 AM

    Maintenance is now in progress

  • Planned
    December 06, 2023 at 3:45 AM
    Planned
    December 06, 2023 at 3:45 AM

    We need to move the disks from this server into a new chassis, and we were unable to schedule this in advance due to dynamic availability. It should be roughly the same work as the previous reboot, and back online in no time. No inbound email will be missed, the whole thing shouldn't feel any worse than temporarily losing your cell phone signal as you drive down the road.

    While we have a 3 hour window set for this, it's expected to take about 10-15 minutes.