Dec 13: FastMail DNS hosting

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 12th December was about our multi-master database replication system. The following post on 14th December is about our 24/7 oncall duty roster.

Technical level: low-medium

Part of running any website or email service is that you need to publish DNS records for your domains. DNS is what allows computers to convert names like “fastmail.com” into IP addresses that computers use to actually talk to each other.

In the very early days of FastMail, we used an external service called ZoneEdit to do this. That’s fine when your DNS requirements are simple and don’t change much, but over time our DNS complexity and requirements increased, so we ended up moving to and running our own DNS servers.

For a long time, we used a program called TinyDNS to do our DNS serving. TinyDNS was written by Dan Bernstein (DJB). DJB’s software has a history of being very security conscious and concise, but a bit esoteric in its configuration, setup and handling.

While TinyDNS worked very well for us (extremely reliable and low resource usage), one issue with TinyDNS is that it reads its DNS data from a single constant-only database file that is built from a corresponding input text file. That means to make any DNS changes, you have to modify/rebuild the input data file, and then rebuild the entire database file each time you make even a single change.

That was fine when DNS hosting was just for us, but over time we found more and more people wanted to use their own domain for email, so we opened up DNS hosting to our users. To make it as easy as possible for users, when you add a domain to your FastMail account, we automatically publish some default DNS records, with the option to customise as much as you want.

Allowing people to host their DNS with us is particularly important for email because there’s actually a number of complex email-related standards that rely on DNS records. For websites, it’s mostly about just creating the right A (or in some cases, CNAME) record. For email though, there’s the MX records for routing email for your domain, wildcard records to support subdomain addressing, SPF records for stopping forged SMTP FROM addresses, DKIM records for controlling signing of email from your domain. There’s also SRV records to allow auto-discovery of our servers in email and CalDAV clients, and a CNAME record for mail.yourdomain.com to log in to your FastMail account. In the future, there’s also DMARC records we want to allow people to easily publish. For more information on these records and standards, check out our previous post about email authentication.

The problem with TinyDNS was that any change people made to their DNS, or any new domains added at FastMail, we couldn’t just immediately rebuild the database file because it’s a single file for ALL domains, so it’s quite large. So instead we’d only rebuild it every hour, and people had to be aware that any DNS changes they made might take up to an hour to propagate to our actual DNS servers. Not ideal.

So a few years back, we decided to tackle that problem. We looked around at the different DNS software available, and settled on PowerDNS. One of the things we particularly liked about PowerDNS was its plug-able backends, and its support for DNSSEC. Using this, we were able to build a pipe based backend that talked to our internal database structures. This meant that DNS changes are nearly immediate (there’s still a small internal caching time).

Because DNS is so important, we tested this change very carefully. One of the things we did was to take a snapshot of our database, and capture all the DNS packets to/from our TinyDNS server for an hour. On a separate machine, we tested our PowerDNS based implementation with the same database snapshot, and replayed all the DNS packets to it, and checked that all the responses were the same.

With this confirmation, we were able to rollout the change from TinyDNS to PowerDNS. Unfortunately even with that testing, we still experienced some problems, and had to rollback for a while. After some more fixing and tests, we finally rolled it out permanently in Feb 2013 and it’s been happily powering DNS for all of our domains (e.g. fastmail.com, fastmail.fm, messagingengine.com, etc) and all user domains since.

Our future DNS plans include DNSSEC support (which then means we can also do DANE properly, which allows server-to-server email sending to be more secure), DMARC record support, and ideally one day Anycast support to make DNS lookups faster.

For users, if you don’t already have your own domain, we definitely recommend it as something to consider. By controlling your own domain, you’ll never be locked to a particular email provider, and they have to work harder to keep your business, something we always aim to do :)

With the new GTLDs that have been released and continue to be released, there’s now a massive number of new domains available. We use and recommend gandi.net and love their no bullshit policy. For around $15-$50/year, your own domain name is a fairly cheap investment on keeping control of your own email address forever into the future, and with FastMail (and an Enhanced or higher personal account, or any family/business accounts), we’ll do everything else for you. Your email, DNS and even simple static website hosting.

Posted in Advent 2014. Comments Off

Dec 12: FastMail’s MySQL Replication: Multi-Master, Fault Tolerance, Performance. Pick Any Three

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 11th December was from our support team. The following post on 13th December is about hosting your own domain with us.

Technical level: medium

For those who prefer watching videos over reading, here’s a talk I gave at Melbourne Perl Mongers in 2011 on FastMail’s open-sourced, custom MySQL replication system.

Most online services store their data within a database, and thanks to the culture of Open Source, these days there are plenty of robust RDBMs to choose from. At FastMail, we use a Percona build of MySQL 5.1 because of their customised tooling and performance patches (If you haven’t heard of Percona, I recommend trying them out). However, even though MySQL 5.1 is a great platform to work with, we do something differently here – we don’t use its built-in replication system and instead opted to roll our own.

First, what’s the problem with running an online service on a single database? The most important reason against this is the lack of redundancy. If your database catches on fire or as what happens more often the oom-killer chooses to zap your database server as it’s usually the biggest memory hog on a machine, none of your applications can continue without their needed data and so your service is taken offline. By using multiple databases, when a single database server is downed, your applications still have the others to chose from and connect to.

Another reason against using a single database for your online service is degraded performance – as more and more applications connect and perform work, your database’s server load increases. Once a server can’t take the requested load any longer, you’re left with query timeout responses and even refused connections, which again takes your service offline. By using multiple database servers, you can tell your applications to spread their load across the database farm thus reducing the work a single database server has to cope with, while gaining an upshot in performance for free across the board.

Clearly the best practice is to have multiple databases, but why re-invent the wheel? Mention replication to a veteran database admin and then prepare yourself a nice cup of hot chocolate while they tell your horror stores from the past as if you’re sitting around a camp fire. We needed to re-invent the wheel because there are a few fundamental issues with MySQL’s built-in replication system.

When you’re working with multiple databases and problems arise in your replication network, your service can grind to a halt and possibly take you offline until every one of your database servers back up and replicating happily again. We never wanted to be put in a situation like this and so wanted the database replication network itself to be redundant. By design, MySQL’s built-in replication system couldn’t give us that.

What we wanted was a database replication network where every database server could be a “master”, all at the same time. In other words, all database servers could be read and written to by all connecting applications. Each time an update occurred on any master, the query would then be replicated to all the other masters. MySQL’s built-in replication system allows for this, but it comes with a very high cost – it is a nightmare to manage if a single master was downed.

To achieve master-master replication with more than two masters, MySQL’s built-in replication system needs the servers be configured in a ring network topology. Every time an updated occurs on a master, it executes the query locally, then passes it off to the next server in the ring, which applies the query to its local database, and so on – much like participants playing pass-the-parcel. And this works nicely and is in place in many companies. The nightmares begin however if a single database server is downed, thus breaking the ring. Since the path of communication is broken, queries stop travelling around the replication network and data across every database server begin to get stale.

Instead, our MySQL replication system (MySQL::Replication) is based on a peer-to-peer design. Each database server runs its own MySQL::Replication daemon which serves out its local database updates. They then run a separate MySQL::Replication client for each master it wants a feed from (think of a mesh network topology). Each time a query is executed on a master, the connected MySQL::Replication clients take a copy and applies it locally. The advantage here is that when a database server is downed, only that single feed is broken. All other communication paths continue as normal, and query flow across the database replication network continue as if nothing ever happened. And once the downed server comes back online, the MySQL::Replication clients notice and continue where they left off. Win-win.

Another issue with MySQL’s built-in replication system is that a slave’s position relative to its master is recorded in a plain text-file called relay-log.info which is not atomically synced to disk. Once a slave dies and comes back online, files may be in an inconsistent state. If the InnoDB tablespace was flushed to disk before the crash but relay-log.info wasn’t, the slave will restart replication from an incorrect position and so will replay queries, leaving your data in an invalid state.

MySQL::Replication clients store their position relative to their masters inside the InnoDB tablespace itself (sounds recursive, but it’s not since there is no binlogging of MySQL::Replication queries). As updates are done within the same transaction as replicated queries are executed in, writes are completely atomic. Once a slave dies and comes back online, we are still in a consistent state since either the transaction was committed or it will be rolled back. It’s a nice place to be in.

MySQL::Replication – multi-master, peer-to-peer, fault tolerant, performant and without the headaches. It can be found here.

Posted in Advent 2014, MySQL. Comments Off

SSL certificates updated to SHA-256, RC4 disabled

Today we’re rolling out SHA-256 certificates. We announced this last month, and you can read that post for more information about why this is necessary.

At the same time, we’ve disabled the RC4 cipher suite. RC4 has long been considered broken and the browser security community recently started actively discouraging its use. The SSL Labs test penalises it, and Chrome has started presenting a low-priority warning.

All this means that we’re now get an A+ grade on the SSL Labs test, which is a good indicator that when it comes to our SSL/TLS configuration we’re pretty much in step with current industry best-practice.

If, like most of our users, you use the web client in a modern web browser, you won’t notice any difference. In older browsers and some IMAP/POP3/DAV/LDAP clients, you may start seeing disconnection problems if they don’t know how to handle SHA-256 certificates or rely on RC4. In these cases you’re encouraged to upgrade your clients and if necessary, contact your the author of your client for an update. In the meantime, you can use insecure.fastmail.com (web) and insecure.messagingengine.com (IMAP/POP/SMTP), both of support RC4 and have a SHA-1 certificate. As always, we highly discourage the use of these service names because they leave your data open to attack, and we may remove them in the future.

Posted in News. Comments Off

Dec 11: FastMail Support

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 10th December was the second security post, on availability. The following post on 12th December is about our multi-master database replication.

Technical level: low

Support system

FastMail has a comprehensive support system. We have a well written, and well maintained online help system, where you can read in detail about every aspect of the service. This is complemented by our blog(which is likely where you are reading this) which you can subscribe to, where you can read about what we are working on, or where the service is generally heading including any recent changes.

To see the live running status of various FastMail services, you can see our own status page. That page also gives you an idea about how the service has been operating in the past week. If you would like to see an uptime report of FastMail from an independent third-party, there’s always the status report from Pingdom.

But in spite of all that, sometimes you will need to get in touch with a human being who knows the service really well. In that case, you can get in touch with a member of our friendly support team using our ticket system.

Support team and support process

Back in early 2000-ish, I realized IMAP as The True Path while the mainstream practice was POPanism, and searched hard and wide for a good provider. I found FastMail to be the best IMAP provider on the planet, and signed up for an account. And I would say the same thing today as well – as far as the ‘best IMAP provider’ bit goes. As for ‘The True Path’, I am a born-again IMAPian, and would say JMAP fits the bill these days as we are transitioning from the Destkopian age to the AndroiPhonean age. Our new apps for iPhone and Android use JMAP to do their work, and you can see how well they work, especially on high-latency connections.

But I digress.

So, these were early days, and soon FastMail advertised a tech-support position. I had done a side-project of developing a basic email client mainly to learn email protocols, so when the opportunity came along, I was very keen and I applied and soon landed the job.

I was joined by Vinodh and Yassar around 2008, and from then on, they have been handling front-line requests until recently.

Now a days, you will find a good chunk of your support ticket requests handled by our new recruit Afsal (and soon by Anto as well), especially during US day time.

New support tickets are handled by those techs in front-line support. Any issues escalated will usually be dealt by Yassar, and further escalations will be handled by myself. Yassar and I escalate issues to either an engineer on duty, or if we are sure the issue is related to somebody’s area of expertise, directly to that developer or admin.

Future plans

We plan to provide 24 hour support coverage in the future, and we are working hard towards that. We will extend support coverage to the US day time first, after which we will start extending coverage to other timezones as well. Recruiting and training new people will take time, but we’ll get there eventually.

Most frequently asked questions

As today’s post is about support, I’ll list here the most common questions that we see in support tickets, which you can get quick help for from our help documentation itself:

Remember; there is a wealth of information in our support system online, so thats a good first place to go to, to learn about FastMail. But our support team is always at hand, should you have questions!

See you again!

It is very satisfying to engage with our users, and help them make the most out of their FastMail accounts. Sometimes its a 75 year old grandma in the US who asks how best they can share their new recipes to a selected set of contacts(think address book groups), and sometimes its the uber-geek from across the globe who asks what scheme we use to hash our passwords(we use bcrypt)! Our customers are diverse, their questions intimidating interesting, and the experience satisfying!

FastMail has seen tremendous growth in the past few years, and we are working hard on scaling our support team to match. You should see the results of this work in the months to come.

Posted in Advent 2014. Comments Off

Dec 10: Security – Availability

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 9th December was about email authentication. The following post is from our support team.

Technical level: medium

Availability is the ability for authorised users to gain access to their data in a timely manner.

While both Confidentiality and Integrity are important, they are not noticed unless something goes wrong. Availability on the other hand, is super visible. If you have an outage then it will be all through the media – like Yahoo, or Microsoft, or Google, or indeed FastMail.

Availability at FastMail

Our record really speaks for itself. Our public pingdom page and status page show how reliably available we are. FastMail has great uptime.

We achieve this by reducing single points of failure, and having all data replicated in close to real time.

I was in New York earlier this year consolidating our machines, removing some really old ones, and also moving everything to new cabinets which have a better cooling system and more reliable power. We had out-grown the capacity in our existing cabinets, and didn’t have enough power to run completely on just half our circuits any more.

Our new cabinets have redundant power – a strip up each side of the rack – and every server is wired to both strips, and able to run from just one. Each strip has the capacity to run the entire rack by
itself.

03-pdu-powerin
05-back-04

The servers are laid out in such a way that we can shut down any one cabinet. In fact, we can shut down half the cabinets at a time without impacting production users. In 2014 it’s not such a big deal to be able to reinstall any one of your machines in just a few minutes – but in 2005 when we switched to fully automated installation of all our machines, only a few big sites were doing it. For the past few years, we’ve been at the point where we can shut down any machine with a couple of minutes’ notice to move service off it, and users don’t even notice that it’s gone. We can then fully reinstall the operating system.

We have learned some hard lessons about availability over the years. The 2011 incident took a week to recover from because it hit every server at exactly the same time. We couldn’t mitigate it by moving load to the replicas. We are careful not to upgrade everywhere at once any more, no matter how obvious and safe the change looks!

Availability and Jurisdiction

People often ask why we’re not running production out of our Iceland datacentre. We only host secondary MX and DNS, plus an offsite replica of all data there.

While we work hard on the reliability of our systems, a lot of the credit for our uptime has to go to our awesome hosting provider, NYI. They provide rock-solid power and network. To give you some examples:

  • During Hurricane Sandy, when other datacentres were bucketing fuel up the staircases and having outages, we lost power on ONE circuit for 30 seconds. It took out two units which hadn’t been cabled correctly, but they weren’t user facing anyway.
  • We had a massive DDOS attempted against us using the NTP flaw a while ago. They blocked just the NTP port to the one host being attacked, and informed us of the attack while they asked their upstream providers to push the block out onto the network to kill off the attack. Our customers didn’t even notice.
  • They provide 24/7 onsite technical staff. Once when they were busy with another emergency, I had to wait 30 minutes for a response on an issue. The CEO apologised to me personally for having to wait. Normal response times are within 2 minutes.

The only outage we’ve had this year that can be attributed to NYI at all is a 5 minute outage when they switched the network uplink from copper to fibre, and managed to set the wrong routing information on the new link. 5 minutes in a year is pretty good.

The sad truth is, we just don’t have the reliability from our Iceland datacentre to provide the uptime that our users expect of us.

  • Network stats to New York: you see the only time it drops below 99.99% is July, when I moved all the servers, and there was the outage on the 26th (actually 5 minutes by my watch). As far as I can tell, the outages on the 31st were actually a pingdom error rather than a problem in NYI
  • Network stats to Iceland: Ignore the 5 hour outage in August, because that was actually me in the datacentre. We don’t have dual cabinet redundancy there, so I couldn’t keep services up while I replaced parts. Even so, there are multiple outages longer than 10 minutes. These would have been very user-visible if users saw them. As it is, they just page the poor on-call engineer.

If we were to run production traffic to another datacentre, we would have to be convinced that they provide a similar level of quality to that provided by NYI. Availability is the life-blood of our customers. They need email to be up, all the time.

Human error

Once you get the underlying hardware and infrastructure to the level of reliability we have, the normal cause of problems is human error.

We have put a lot of work this year into processes to help avoid human errors causing production outages. There will be more on the testing process and beta => qa => production rollout stages in a later blog post. We’ve also had to change our development style slightly to deal with the fact that we now have two fully separate instances of our platform running in production – we’ll also blog about that, since it’s been a major project this year.

General internet issues

Of course, the internet itself is never 100% reliable, as was seen by our Optus and Vodafone using customers in Australia recently. Optus were providing a route back from NYI which went through Singtel, and it wasn’t passing packets. There was nothing we could do, we had to wait for Optus to figure out what was wrong and fix it at their end.

We had a similar situation with Virgin Media in the UK back in 2013, but then we managed to route traffic via a proxy in our Iceland datacentre. This wouldn’t have worked for Australia, because traffic from Australia to Iceland travels through New York too.

We are looking at what is required to run up a proxy in Australia for Asia-Pacific region traffic if there are routing problems from this part of the world again. Of course, that depends on the traffic from our proxy being able to get through.

One of the nastiest network issues we’ve ever had was when traffic to/from Iceland was being sent through two different network switches in London, depending on the exact source/destination address pair – and one of the switches was faulty – so only half our traffic was getting through. That one took 6 hours to be resolved. Thankfully, there was no production traffic to Iceland, so users didn’t notice.

Posted in Advent 2014. Comments Off

Dec 9: Email authentication

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 8th December was about our rich text email editor. The following post on 10th December is the second security post, on availability.

Technical level: medium.

What is it?

Email authentication is a collection of techniques to verify that an email is legitimately from the sender it claims to be from.

History

Back when email was first designed the world was a much more trusting place. The early internet consisted mainly of government agencies and educational institutions, and the notion that anyone would forge sender details in an email wasn’t considered.
Although standards have moved on since then, compatibility is always a concern when implementing new standards, as a result, email standards do not have sender authentication at their core.

In today’s internet, spam and phishing emails are or course a big problem, email authentication has become something which needs to be addressed, and a number of techniques have been developed to help achieve this. Here is a brief summary of some of the popular ones, and how FastMail uses each one.

Real-time Blackhole List (RBL)

RBL, also known as a DNS-Based Blackhole list check is a method where inbound IP addresses are checked against a list of known bad servers. This gives an early indication of which mail is more likely to be spam.
In addition to checking incoming mail against these lists we also monitor our own IP addresses on the lists as an early warning system. When one of our addresses is listed we take steps to find and remove the problem account while sending mail via one of our other outgoing servers to minimise disruption to our legitimate users.

Sender Policy Framework (SPF)

SPF is a domain based authentication system. It allows a recipient to verify that the sending server is authorised to send email for a particular domain.

Domain owners can publish a list of valid addresses in their DNS records, and can suggest how to deal with email which does not come from an address on that list.

Unfortunately, the sender address verified by SPF is not necessarily the sender address that you see in your email client. The address checked by SPF is the return path sent in the SMTP transaction, which may differ from the address in the email’s From: header.

Microsoft attempted to fix this by introducing SenderID, which is similar to SPF but can verify the From address header. However, there were numerous problems with this as a standard and it isn’t widely implemented.

At FastMail we use SPF as one of the many factors to spam filter incoming mail. For outgoing mail we specify our servers explicitly so that they get a positive score for successful SPF, but also say “?all” to allow for other systems to send from addresses on our domains. If you have your own domain with us, then of course then you can setup your own SPF records to be as strict or as liberal as you need.

DomainKeys Identified Mail (DKIM)

DKIM is a merging of two older standards; DomainKeys, and Identified Internet Mail. The intention of DKIM is to verify that an email associated with a particular domain was sent by an authorised agent of that domain and has not been modified since being sent. Based on Public Key Cryptography, a domain owner creates a public and private key pair, publishes the public part, and then uses the private part to sign the body and selected headers of an email. The receiver of an email is then able to check that signature against the public part of the key to verify that the sender of the email had access to the private part of the key, and therefore is authorised to send email on behalf of that domain.

Problems: Again, there is no stipulation that the domain in the From: header signs the email. It is possible for an email to be signed by any domain and pass basic DKIM checks.

At FastMail we sign all outgoing emails with a key for our messagingengine.com domain, and also with a key for the domain of your email address (e.g. fastmail.com).

If you use your own domain with us, we will automatically sign emails with a DKIM key if you host your DNS with us. We also make it super easy to setup both SPF and DKIM.

For incoming mail, again, we use DKIM as a tool in spam filtering. DKIM is also used to validate official emails from FastMail, We use the DKIM signature combined with the headers to validate that the email was sent by an official FastMail staff account and add a green tick next to legitimate emails.

Author Domain Signing Practices (ADSP)

ADSP is an extension to DKIM whereby a domain owner can publish a policy stating how email from their domain should be signed. A domain owner can state one of the following.

  1. Legitimate email from this domain may or may not be signed by the domain.
  2. Legitimate email from this domain will be signed by the domain.
  3. Legitimate email from this domain will be signed by the domain, and non signed email should be discarded.

The domain used for ADSP is the domain in the From header of the email, and is the one most likely seen by the recipient.

Domain-based Message Authentication, Reporting & Conformance (DMARC)

DMARC brings together the SPF and DKIM checks and ties them in with the sender shown in the From address of the email.

In order to be considered ‘DMARC aligned’ an email must pass SPF and DKIM checks, the SPF domain must match the domain of the From address, and at least one DKIM signature must also match that domain. This provides a good level of certainty that the email is not forged.

Domain owners who choose to publish DMARC records can suggest what should be done with messages which do not pass DMARC tests, reject outright, or quarantine (treat as spam).

The reporting part of DMARC is a tool for domain owners rather than end users. Email receivers who fully implement DMARC build reports on email received, and send reports to domain owners who request them (via the published DMARC record). This report shows some basic aggregate information on number of emails received, the servers they were received from, and their SPF and DKIM status. This allows domain owners to discover how their domain is being used, which can then inform decisions on the best SPF/DKIM/DMARC policies to publish.

We are investigating how we can use DMARC to benefit FastMail users. Given the wide range of ways in which our users are using our services it would be bad to publish reject or quarantine policies. The number of legitimate emails which would be blocked by this would far outweigh any possible benefit. Implementing over restrictive policies in an environment where email is used in a diverse way would result in pain for users.

Challenges

The biggest issue by far is Mailing Lists. An email sent to a mailing list will typically be re-sent from the mail server of the list, breaking SPF, and usually has some alterations made to it (such as subject changes or unsubscribe links added), which break DKIM signatures.

It is also fairly common that a third party would send email on your behalf, for example, a company might contract out their support and ticketing system to a third party, and emails from the ticketing system would be sent from the companies domain. Care needs to be taken to ensure that these emails are also considered in SPF policies, and that the third party is able to properly DKIM sign these messages.

Third party senders has been one of the challenges we faced while implementing our green tick and phishing warnings system. This blog is hosted by WordPress, and sends email on behalf of FastMail (the blog notifications). We needed to make sure that these emails from WordPress could be identified and validated against the WordPress DKIM signature, check that they were sent from our WordPress blog, and make sure those emails were not marked with the phishing warning box. This needs to be done on a case by case basis as what identifies emails on WordPress isn’t likely to be the same as what identifies emails on other services such as Twitter.

Another common source of SPF failures is forwarding. If, for example, a user has migrated to FastMail from another provider, and has set up their old provider to forward email to their FastMail address, then we would see the IP address of the forwarding server, not the originating server, and this could result in an SPF failure for an otherwise legitimate email. There are some standards which attempt to address this such as Sender Rewriting Scheme (SRS), which involves rewriting the envelope sender to one at the forwarding domain. This fixes the SPF problem, but if the originating domain uses DMARC, then the email would no longer be aligned as the from addresses will no longer match.

The trouble of course is that phishing emails can be sent from entirely unrelated domains and are still successful. So sender authentication doesn’t always help.

Another problem with email authentication is a misunderstanding of what is being verified. We can take technical steps to verify that an email did come from a legitimate email account, but we can make no claim over how trustworthy the author of that email is. Anybody can purchase a domain, set it up properly with SPF, DKIM, and DMARC, and then use it to send spam or phishing emails. Also any service, including FastMail, faces the problem that accounts can be compromised and used to send bad content. Detecting and dealing with this is a whole other blog post.

Posted in Advent 2014. Comments Off

Dec 8: Squire: FastMail’s rich text editor

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 7th December was about automated installation. The following post on 9th December is on email authentication.

Technical level: low-medium.

We’re going to take a break from talking about our backend infrastructure in this post and switch over to discussing our webmail.

In the beginning, there was text. And really, it was pretty good. You could *emphasise* things, SHOUT AT PEOPLE, and generally convey the nuance of what you had to say. But then came HTML email. Now you could make big bold statements, or small interesting asides. Your paragraphs were no longer hard-wrapped, but instead flowed according to the size of your screen. Despite some grumblings from a dedicated band of luddites (including a few of the FastMail team :-)), most people decided that this was, in fact, better.

To support rich text editing in our previous interface, we used CKEditor. While not a bad choice, like most other editors out there it was designed for creating websites, not writing emails. As such, simply inserting an image by default presented a dialog with three tabs and more options than you could believe possible. Meanwhile, support for quoting, crucial in email, was severely limited. It also came with its own UI toolkit and framework, which we would have had to heavily customise to fit in with the rest of the new UI we were building; a pain to maintain.

With our focus on speed and performance, we were also concerned about the code size. The version of CKEditor we use for our previous (classic) UI, which only includes the plugins we need, is a 159 KB download (when gzipped; uncompressed it’s 441 KB). That’s just the code, excluding styles and images. To put this in perspective, in the current interface the combined code weight required to load the whole compose screen, including our awesome base library (more on that in a future post…), the mail/contacts model code and all the UI code to render the entire screen comes to only 149.4 KB (459.7 KB uncompressed).

After considering various options, we therefore decided to strike out on our own and wrote Squire.

Making a rich text editor is notoriously difficult due to the fact that different browsers are extremely inconsistent in this area. The APIs were all introduced by Microsoft back in the IE heyday, and were then copied by the other vendors in various incompatible ways. The result of applying document.execCommand to simply bold the selected text is likely to be different in every browser you try.

To deal with this, most rich text editors execute a command, then try to clean up the mess the browser created. With Squire, we neatly bypass this by simply not using the browser’s built-in commands. Instead, we manipulate the DOM directly, only using the selection and range APIs. This turns out to be easier and require less code than letting the browser do any of the work!

For example, to bold some text, we use the following simple algorithm (actually, this more generally applies to any inline style, such as setting a font or colour too):

  1. Iterate through the text nodes in the DOM that are part of the current selection.
  2. For each text node, check if it’s already got a parent <b> tag. If it does, there’s nothing to do. If not, create a new <b> element and wrap the text node in it. If the text node was only partially in the selection, split it first so only the selected part gets wrapped.

I’d like to give a quick shout out to the under-appreciated TreeWalker API for iterating through the DOM. True story: when first developing Squire, I came across a bug in Opera’s TreeWalker implementation. The first comment on my report from the Presto developer team (this was pre-WebKit days at Opera) was, and I quote verbatim, “First TreeWalker bug ever. First TreeWalker usage ever? :)”. Sadly, due to the lack of common use, other browsers have also had the occasional bug with this API too, so to be on the safe side I just reimplemented the bits of the API I needed in JavaScript. The idea is sound though.

Squire also completely takes over certain keys that are handled badly by default, such as enter and delete. This lets us get a consistent result, and allows us to add the features we want, such as breaking nested quotes if you hit enter on a blank line. And of course we’ve added our own keyboard shortcuts too for actions like changing quote level or starting a bullet list.

At only 11.5 KB of JavaScript after minification and gzip (34.7 KB uncompressed) and with no dependencies, Squire is extremely lightweight. If you’re building your own webmail client, or something else that needs to be able to edit rich text, give it a go! Squire is MIT licensed and available on GitHub.

Posted in Advent 2014. Comments Off
Follow

Get every new post delivered to your Inbox.

Join 6,227 other followers