Copyright © 1999–2016 FastMail Pty Ltd
Note: The SSD market is moving quickly, so I'm sure by the time you read this, it's probably out of date :)
As mentioned in another post, we're looking for some new servers. These are to replace our existing database servers, which have had a long and productive life (over 6 years). They're still running fine, but they're starting to get a little more loaded than we'd like during backup runs. As an FYI, our database requirements are much smaller than you'd expect for our size. Our database holds information like the user list, domains, aliases, user address book, personalities, signatures, etc, but most of our storage space is used by emails and files, and those aren't stored in the database at all. Because of that, we only need about 20G of storage now.
I'd really like to get some machines that will last at least 3 years, and scale to at least 20x what the existing servers can. To do that, I think we need to look at using SSD technology for storage. Unfortunately the SSD storage market currently presents a whole range of choices, all with their own annoying pros and cons. For the 20x scale, I'm looking at about 200-400G of storage. Looking over them I've found these.
- SATA interface, so doesn't need a separate driver
- Cheap (~$800 for 64G = $12.50/G). At that price, you can get a couple and RAID10 them fairly easily though, and doubling the price/G still leaves it the cheapest
- Slow compared to other SSD solutions (1000's IOPS rather than 10,000's or 100,000's)
- Questions about performance when you disable write cache, put them in a RAID array, and do lots of writes over time. Some initial analysis here: http://www.bigdbahead.com/?p=518, http://www.bigdbahead.com/?p=532, http://www.bigdbahead.com/?p=555, http://www.bigdbahead.com/?p=557
- PCI-e interface, needs a driver for Linux that previously was of questionable quality (http://archives.postgresql.org/pgsql-performance/2008-07/msg00010.php), I haven't seen any recent user reports about use and stability, so I don't know where it's at for production use
- Reasonably expensive (~$3000 for 80G = $37.5/G)
- In theory fast. Data sheet claims "IOPS 89,549 (75/25 r/w mix 4k packet size)" for 80G version. But when used in actual database system, increase over SAS RAID systems is only about 400% maximum (http://www.kennygorman.com/wordpress/?p=398, http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/)
- There's also questions about the durability of storage, or the performance if you enable durable mode (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/, http://www.mysqlperformanceblog.com/2009/06/15/testing-fusionio-strict_sync-is-too-strict/)
- PCI-e interface, there's only a binary only driver for RH/SUSE at the moment, but apparently they're working on an open source driver
- Reasonably expensive (~$20,000 for 450G = $44/G, I'm guessing the 225G is about half that)
- In theory fast. Claims "120,000 sustained IOPS", but I haven't seen anyone use them in a database benchmark
- The RamSan and Fusion IO superficially appear very similar.
According to TMS, here's some claimed benefits over Fusion-IO.
- CPU and memory resources - Fusion-IO uses system CPU + RAM for flash management (up to 30-40% CPU + 5.5G RAM per 80G of flash), RamSan has onboard CPU + DRAM. "The RamSan-20 has an onboard processor and four FPGA controllers while the cards from Fusion use the server’s processor and memory to run and manage the card. The write management software that we have programmed into the RamSan-20 is far superior to the write management software on the cards from Fusion I/O. This is why our sustained performance is dramatically better"
- All SLC. Higher capacity Fusion-IO devices use MLC, which is slower and has lower lifetime
- Claim Fusion-IO IOPS are "burst" numbers, while theirs are "sustained"
- Super Capacitors - are battery back-ups on the RamSan-20 that provide enough power to shut down the card gracefully in the event of an unmanaged server shutdown. Fusion's card does not have Supercaps. Data is not lost but the recovery time is incredibly long. After the server is rebooted, the Fusion card needs to rebuild their index tables showing where the data is located. The RamSan-20 data is immediately available after the server reboot, the Fusion card will take about 10 minutes to reload
- There's some benchmarks of a RamSan-500 (external flash based unit) here: http://www.bigdbahead.com/?p=139, http://www.bigdbahead.com/?p=140, http://www.bigdbahead.com/?p=141
- Multiple SAS interfaces, so doesn't need a separate driver
- Separate 1U rack mount box
- Very expensive (~$45,000 for 480G = $93.75/G)
- Fast, and the massive 2T system claims >1.6M IOPS
- Seems a nice idea, but with a minimum price of $45k for one box, you need to be able to justify the amount and cost of that storage. For larger database users, this might be really interesting to try out
- Multiple interfaces is seems: "Fibre Channel and Ethernet network attachments are supported via a network head and direct attachment through a low latency PCI Express (PCIe) connection. Operating systems supported via an open source PCIe driver include: Major Linux releases and distributions, Windows 32 and 64-bit Operating Systems, OpenSolaris"
- I can't find price information anywhere, I'm guessing it's "if you have to ask, you can't afford it" school of products? With up to 4TB support, it seems it's aiming for a particularly high end market compared to what we're looking for (100+G of storage)
- Very fast. Interesting that you can choose between DRAM or Flash memory to vary performance/durability (http://violin-memory.com/Memory_Flexibility). But if you go DRAM, you need to be extra careful about power, because if power goes, you lose it all.
- There's some benchmarks here: http://www.bigdbahead.com/?p=334
There doesn't appear to be a clear "winning" solution, as usual, it depends on your storage, IO, and cost requirements. When I first heard about Fusion IO, I was initially very excited about what it seemed to offer, but over time, I've become a bit more circumspect given the concerns over durability, system overhead (RAM + CPU), and recovery time requirements. The Sun and Violin options seem designed at considerably higher end (both storage space and cost) systems than what we're looking for. I really wish the RamSan-10/20 had an open source Linux driver. Given TMS have a long history with NVRAM systems, I have a gut feeling of "they know what they're doing", but for us, an open source driver is mandatory. That really just leaves Intel drives in a RAID array at the moment, which is probably what we'll aim for now. By starting with just 2 drives now (very cheap), we can at least replace them in the future with either more, or newer and faster drives, or another solution altogether if it comes along.