HTML emails – from bad to worse

Rob Mueller – 25 August 2010

This is a technical post. Regular Fastmail users subscribed to receive email updates from the Fastmail blog can just ignore this post.

Originally email was designed as a text only medium (1982). Over time, various extensions were added to allow transporting attachments (1993), and for different content formats, such as HTML (1997).

HTML has become a very popular way of delivering richer email content. HTML has many tools and a large infrastructure, and is easy to display in web browsers, because that’s what they’re actually designed to do.

The problem is that HTML is a markup language, but users generally use WYSIWYG type tools to edit the content of their messages, and those tools then output HTML. Unfortunately the HTML they output is of variable quality. To make things even worse, most email reading software has limited HTML & CSS display capabilities, so senders can’t actually rely on the full range of HTML or CSS to be available. The result is that most HTML email still uses the same type of HTML we were using in 1999, with deeply nested tables and explicit attributes on each tag to layout the email content.

As a web mail provider, we have to deal with all the variable HTML content, and try and display it correctly. I’ve seen numerous odd examples over the years, from emails that use absolute positioning and fixed width and height on every single element to layout everything in a neat grid (and is horribly broken if you change the font size at all), to the messy conditional comment HTML that Microsoft Word generates.

However recently I’ve had a few extreme examples of of badly generated HTML arrive, and in each case it’s been from Mac Mail (specific header “X-Mailer: Apple Mail (2.936)”). I’ve removed the content, added some newlines between tags, and put an example here. That looks pretty ordinary, nothing funny. Now looking at the HTML that generated it (I’ve put the HTML as text with appropriate indenting here to make it easier to follow). That’s 330k in size, and 5407 lines of almost entirely HTML tags. To get the initial piece of text content, it’s 741 nested tags! Worse a lot of that is nested inline and block tags alternating one after another, which is technically invalid HTML, and really annoys our HTML tidy code that tries to fix it up.

I’ll be working to try and fix this, but at the moment, really bad emails like this can cause extremely slow display on some browsers when viewed via the webmail interface.