Support experiencing degraded performance

Outbound Email Delays

Resolved
Degraded performance
Started over 1 year ago Lasted 2 days

Affected

SMTP
Updates
  • Resolved
    Resolved

    Considering this resolved, but we'll continue to monitor it.

  • Identified
    Identified

    A very small subset of outbound email has been seeing delays up to 1 hour. This is estimated to impact around 400 outbound emails per day, across various servers. While this is a very small portion of the email that our customers send, an unnecessary delay is in fact a problem.

    The issue is that Rspamd on the outbound filter stops logging multiple times per day. This is not impacted by rsyslog, or by replacing rsyslog. This issue impacts only Rspamd, and Rspamd presents no errors relating to it. To resume logging, we have to restart the service. While the service is restarting, there is a very tiny window (roughly 1 second or less) where Postfix rejects connections from the primary servers (the ones you connect to and send mail from). When Postfix rejects the connection, Exim on the primary server that was rejected holds the email in queue and stops trying to connect to the filter server at all, until retry time is reached. The answer here is that we need to fix Rspamd, that's where this all comes from. We're working on building out a new deployment to completely replace it. This will be done in production with no downtime for any service. This status will be updated when we finish that new deployment.