MXroute - Banshee Server Issue – Incident details

All systems operational

Banshee Server Issue

Resolved
Operational
Started over 1 year agoLasted about 22 hours

Affected

IMAP

Operational from 12:32 AM to 10:29 PM

SMTP

Operational from 12:32 AM to 10:29 PM

Webmail

Operational from 12:32 AM to 10:29 PM

Updates
  • Resolved
    Resolved

    This incident has been resolved. Postmortem and email to customers will be a bit delayed while the details are worked out.

  • Update
    Update

    We're still performing an rsync of the data, and it's intended that the next step takes much less time than this one. When this is over we'll detail in a postmortem report what went wrong, how we'll prevent it from happening to such a degree in the future, and we'll talk about compensation (the latter will be in an email to the relevant customers).

  • Update
    Update

    The rsync of data to the new server is still ongoing. It's roughly 3/4 of the way done.

  • Update
    Update

    The fastest path to resolution right now is for us to migrate the Banshee server to another system. This is unfortunately going to take a significant number of hours, and any attempt to tell you how many hours that would be would have to be a lie, it simply can't be known right now. It'll take no less than the time it takes, and we'll make every effort to make that number as small as possible.

    The Banshee server is a legacy cPanel server, and this wasn't how we wanted to retire it's hardware. There is in fact no hardware issue, but for some reason the OS is hosed beyond reasonable repair. All data is intact, and all inbound emails should be held and received after this is complete.

  • Identified
    Identified

    Now it just keeps booting to BIOS config and it won't do anything else. BIOS config is strangely lacking in the proper options that it should contain, options which might help to overcome this. So we're sending datacenter techs back out to see just what in the hell. Everything should be fine soon, no reason to suspect that this will be a long term outage.

  • Update
    Update

    The server goes offline almost immediately after coming back online. This server lacks an IPMI, so we are having a KVM attached. With that said, while booted into rescue mode all disks appear fine, this does not appear to be a hardware issue. We just can't get any strong insight into it without a KVM.

  • Investigating
    Investigating

    Problem reappeared. We are currently investigating this incident.

  • Resolved
    Resolved

    This incident has been resolved.