Email checksums

One of the things we’ve been aiming for for a while is to ensure the robustness of all our stored email by keeping a checksum for every email delivered. We’ve now rolled this out for every email on every server, storing a reliable and secure 160 bit checksum for every message. As mentioned earlier, this was one of the features of cyrus 2.3.10 (the IMAP server we use) that we helped contribute code for.

Most people don’t think corruption is an issue, but recent research by CERN has shown that with today’s large hard drives, this is a potentially serious problem, with an estimated corruption rate of 3 files in every TB of data. In most cases, corruption of data is a silent problem that people don’t realise has happened until they need the data.

To deal with this, we ensure that as soon as an email is delivered to a mailbox, a SHA-1 checksum of that email is generated and stored in the email index.

When the email is replicated, the email content and the checksum are sent separately. We then generate the checksum on the replicated email content and ensure that it matches the original checksum to see that the email was replicated correctly.

We also repeat this procedure when the email is backed up, ensuring that the backup of the email is correct.

We also run a regular check process that takes blocks of emails and recomputes their checksum to see it matches what is in the index. If there’s any issues, we’re alerted and can find which of the master, replica or backup email are correct and can correct the problem.

Posted in Technical. Comments Off

Sending email servers best practice

If you run an outbound SMTP email server, then there’s a number of things you should be doing to ensure smooth sending of your email. All of the things below are fairly straightforward recommendations that are either specified by RFCs or as general best practice.

  1. Ensure your forward and reverse DNS match –  

    Also called Forward Confirmed Reverse DNS, having valid and matching forward and reverse DNS is one of the first recommendations in RFC 1912 (“Make sure your PTR and A records match”). It’s a sign that the system administrator understands at least the basic RFCs. It also helps to avoid spoofing of your systems by spammers.

    You need to ensure that the IP address you are testing is the “edge” one that your email server connects to other servers with. In most cases this is obvious, but you might have a machine with multiple IP addresses, or you might be behind some sort of NAT system, in which case the apparent IP address will be the NAT router IP address, so make sure you are testing the right IP address.

    There’s a tool to test that your forward and reverse DNS match here. Or you can do it easily via Linux command line tools. For instance, here’s the forward & reverse DNS for one of our outgoing hosts.

    $ dig +short out1.smtp.messagingengine.com
    66.111.4.25
    $ dig +short -x 66.111.4.25
    out1.smtp.messagingengine.com.

    Note how out1.smtp.messagingengine.com -> 66.11.4.25 and 66.111.4.25 -> out1.smtp.messagingengine.com, this shows that forward and reverse DNS match.

    If you’re using an ADSL connection or similar, then make sure you get a static IP (most ADSL providers have this option, it may cost a little bit more) and make sure you can get the reverse DNS changed (also known as setting a PTR record – your ADSL provider will have to do this, and not all offer it, so check with your provider first before signing up)

  2. Ensure your HELO string matches your reverse DNS

    When your SMTP server sends email, it has to announce it’s name in the HELO or EHLO command. Since you have your DNS setup correctly, you have a fully-qualified domain name (the reverse DNS name), so you can follow RFC 2821 and use it as your HELO/EHLO string:

    The argument field contains the fully-qualified domain name of the SMTP client if one is available

    Doing this provides another level of verification that your server is who it says it is.

  3. Don’t use Sender Address Verification

    At first glance, Sender Address Verification (SAV) seems like a good idea. Because SMTP doesn’t include include any intrinsic way to authenticate the MAIL FROM address, you just connect to the appropriate return host and check if the site will accept email for that address.

    Unfortunately SAV creates more problems than it solves. As noted by others, it’s easy to work around SAV, spammers just send with a valid MAIL FROM address. Given they’re already spamming lots of valid addresses, they have lots to choose from.

    For spammers that don’t use valid MAIL FROM addresses, the result will be that your system ends up looking like it’s attempting to attack other systems. For instance, say a spammer sends you 1000 emails with forged and invalid @fastmail.fm MAIL FROM addresses. To check them, your server contacts us 1000 times seeing if you can deliver to each address. However that’s exactly the pattern that anyone trying to do a dictionary harvest attack against us would be doing! Without special precautions, your machine will now be treated as extremely suspicious because it just tried to send to lots of invalid addresses at our server.

If you run an email server and have any more suggestions for this list, let me know at robm@fastmail.fm

Posted in Technical. Comments Off

New Backup System finished

I’ve just finished the final component of the conversion of the backup system – a new restore utility.

There were a couple of problems with the old restore utility, stemming from its design of just copying the messages back into the folder and reconstructing to make them “appear” – it didn’t work safely with some IMAP clients, it didn’t work safely with replication and there was a possibility that you could delete messages that were already there if you forced the restore to overwrite what was there.

Oh – and a restore could easily push you over quota if you were near the edge as well.

So – the new restore system is much simpler and safer for all cases, though it involves a little more manual work:

- the admin at our end selects which folders to restore
– the backup system does its magic and a brand new folder tree appears at:

RESTORED.username.YYYYMMDDTHHMMSS

Where “username” is the localpart of your login name, and YYYYMMDDTHHMMSS is a timestamp in year, month, day, hour, minute, second format with a ‘T’ in the middle.

This folder will be deleted automatically one week later. It doesn’t count towards your quota usage during that time, though any emails copied out to a normal folder will of course count as usual.

(I’ve started a forum thread as well for discussion:  http://www.emaildiscussions.com/showthread.php?t=51151)

Posted in Technical. Tags: . Comments Off
Follow

Get every new post delivered to your Inbox.

Join 5,554 other followers