Updating Supermicro BIOS firmware

Bron Gondwana – 9 May 2016

For once we're not writing about anything user visible at all. This is a behind the scenes look at the ongoing hardware refreshing that we periodically do to keep our servers running well.

Creeping CPU load

As we add new features to our service, the profile of server usage changes. Sometimes there are also outside forces involved. For example:

As for clients - we recently discovered that Apple's clients use messages to create synthetic events in their calendar, and then poll the server to make sure that the messages still exist. Instead of checking for FolderName/UIDVALIDITY/UID which is very cheap, it does a search by Message-ID. In Cyrus this is quite expensive, and we have seen processes running for hundreds of seconds processing searches for a couple of hundred messages at once. Optimising this is a goal for another day.

Unbalanced machines

For years the limiting factor on our servers has been IO performance. IO is still vitally important, but the increasing CPU usage meant that for the first time some of our machines had insufficient CPU to saturate their IO capacity. We had purchased 6 machines with E5-2609 CPUs. These CPUs only have 4 cores each and no hyperthreading, giving a total 8 threads of execution. Compared to our newer hosts with E5-2630 v3 CPUs having 8 cores and hyperthreading for 32 threads, the difference in performance was striking.

So we researched CPU upgrade options, and settled on the E5-2630 v2 - a slightly less powerful CPU than the v3 with only 6 cores (so 24 threads total), but still a good improvement and compatible with the older X9DRD-7LN4F motherboards.

First success

We started by purchasing two CPUs and upgrading a single machine to see how it went. It was a success, so we went ahead and ordered the remaining 10.

Unfortunately, we hit a snag. The two oldest machines were running BIOS 2.0, and to support E5-26xx v2 CPUs, we needed at least 3.0. The latest is 3.2, so that's fine. So we entered the world of pain that is upgrading BIOS firmware.

Abandon hope ye

The internet made it pretty clear that it would not be fun to make this work.

The first thing you see on looking for upgraded BIOS firmware is a scary warning. The link from the CPU info page says:

WARNING!

Please do not download / upgrade the BIOS/Firmware UNLESS your system has a BIOS/firmware-related issue. Flashing the wrong BIOS/firmware can cause irreparable damage to the system.

In no event shall Supermicro be liable for direct, indirect, special, incidental, or consequential damages arising from a BIOS/firmware update.

Lovely. Click through a massive page of legalese EULA and you get a .zip file containing a DOS flash utility and a BIOS ROM, along with a batch script that renames the flash utility before use. Seriously.

$ unzip -l X9DRD75_116.zip
Archive:  X9DRD75_116.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   201056  2011-09-14 14:48   AFUDOSU.smc
      127  2013-08-26 10:56   ami.bat
     3260  2012-12-31 15:22   Readme for AMI BIOS.txt
 16777216  2015-01-16 19:01   X9DRD75.116
---------                     -------
 16981659                     4 files

$ cat ami.bat
@echo off

REN AFUDOSU.SMC AFUDOSU.EXE

AFUDOSU.EXE  %1  /P  /B  /N  /K  /R /FDT /MER /OPR

REN AFUDOSU.EXE AFUDOSU.SMC

OK, so I can create a FreeDOS Boot disk, run this command, upgrade the BIOS. Only problem, I'm in Australia and the servers are in New York. So I turned to the remote management.

Remote management web interfaces

There's a special place in hell for the designers of remote management tooling. Almost all of them have self-signed Java applets that may or may not even work on modern Java. Modern Java is a horror show of save-you-from-yourself security anyway. You have to run a very obviously named tool called ControlPanel (caps and all) on Linux to add the IP address of the machine to a list that's allowed to run in a lower security zone. It doesn't support wildcards, which means every machine you administer needs a separate entry in the file.

Even then you click through screens of warnings until the applet runs. If you're super lucky, keystrokes are correctly translated and you can actually interact with the console. Of course you're still interacting with a framebuffer to a VGA console which jumps around to different sizes, moving around the screen, sometimes failing to resize correctly and showing a tiny corner of the remote screen with scrollbars. But the window isn't resizeable. It's like jumping back from 2016 to 1996 again.

Browsers with their require-CA-signed-SSL certs don't help either. All this stuff is great and necessary on the open internet, but I'm connecting up a trusted VPN link directly into our management network, on private-range IPs. It would make this sort of work a ton easier if I could tell both Java and the browser that all IPs in the 10.x.y.0/24 range were trusted and to skip key verification rather than doing each individually.

Anyway, I needed a way to load a DOS image into the machine. The RMM gives two options, upload a floppy image (yay) or a mounting from a Windows file share. Yep, Windows only.

Sadly, the floppy image is limited to 1.44Mb, so no luck there. I was about to hold my nose and install Samba on a machine in our production datacentre hanging off the management network when I remembered a better option.

Bootloaders

At this point I remembered that we can chainload things via the GRUB bootloader. I'm not setting up the drive units, so I already have a big disk in the machine. I created an entry for the disk image to boot, using the excellent syslinux tools.

$ cat 51_bios
#!/bin/bash
#
# Set up BIOS upgrader

prefix=/usr
exec_prefix=${prefix}
bindir=${exec_prefix}/bin
libdir=${exec_prefix}/lib
. ${libdir}/grub/grub-mkconfig_lib

echo "Found BIOSUpgrade image" >&2
cat <<EOF
menuentry 'BIOSUpgrade' --class dos {
EOF
  prepare_grub_to_access_device ${GRUB_DEVICE_BOOT} | sed -e "s/^/\t/"
cat <<EOF
  linux16 /boot/dos/memdisk
  initrd16 /boot/dos/boot.img
}
EOF

With a FreeDOS disk image containing the BIOS file, I figured I was good to go. Maybe I used the wrong images, because this didn't actually work. I booted into DOS and typed the correct invocation, but it froze. Maybe I didn't have the right memory drivers or something. I don't really know DOS that well.

Back to the drawing board, and someone had suggested another option...

When all you have is a command line, everything looks like a file

These Supermicro boards have American Megatrends BIOSes, and AFUDOSU.EXE smelled remarkably like a standard binary. Sure enough, AFU is available for Linux, so I downloaded a copy and tried to run it.

Not so easy...

$ ./afulnx_64
/root/afu/.temp/amifldrv.c:42:1: warning: data definition has no type or storage class [enabled by default]
/root/afu/.temp/amifldrv.c:42:1: error: type defaults to ‘int’ in declaration of ‘module_init’ [-Werror=implicit-int]
/root/afu/.temp/amifldrv.c:42:1: warning: parameter names (without types) in function declaration [enabled by default]
/root/afu/.temp/amifldrv.c:43:1: warning: data definition has no type or storage class [enabled by default]
/root/afu/.temp/amifldrv.c:43:1: error: type defaults to ‘int’ in declaration of ‘module_exit’ [-Werror=implicit-int]
/root/afu/.temp/amifldrv.c:43:1: warning: parameter names (without types) in function declaration [enabled by default]
/root/afu/.temp/amifldrv.c:14:12: warning: ‘amifldrv_init_module’ defined but not used [-Wunused-function]
/root/afu/.temp/amifldrv.c:22:13: warning: ‘amifldrv_cleanup_module’ defined but not used [-Wunused-function]
cc1: some warnings being treated as errors
make[2]: *** [/root/afu/.temp/amifldrv.o] Error 1
make[1]: *** [_module_/root/afu/.temp] Error 2
make: *** [default] Error 2

The program seems to bundle a kernel driver inside itself, and running it extracts source code and builds a kernel module first. That build failed, presumably because it's targetting a different kernel version to the one we're using. Supposedly you can extract the files, but:

$ ./afulnx_64 /GENDRV
 - Program initializing .. 
 - Generate AFULNX driver .... /root/afu/.temp/amifldrv.c:42:1: warning: data definition has no type or storage class [enabled by default]
/root/afu/.temp/amifldrv.c:42:1: error: type defaults to ‘int’ in declaration of ‘module_init’ [-Werror=implicit-int]
/root/afu/.temp/amifldrv.c:42:1: warning: parameter names (without types) in function declaration [enabled by default]
/root/afu/.temp/amifldrv.c:43:1: warning: data definition has no type or storage class [enabled by default]
/root/afu/.temp/amifldrv.c:43:1: error: type defaults to ‘int’ in declaration of ‘module_exit’ [-Werror=implicit-int]
/root/afu/.temp/amifldrv.c:43:1: warning: parameter names (without types) in function declaration [enabled by default]
/root/afu/.temp/amifldrv.c:14:12: warning: ‘amifldrv_init_module’ defined but not used [-Wunused-function]
/root/afu/.temp/amifldrv.c:22:13: warning: ‘amifldrv_cleanup_module’ defined but not used [-Wunused-function]
cc1: some warnings being treated as errors
make[2]: *** [/root/afu/.temp/amifldrv.o] Error 1
make[1]: *** [_module_/root/afu/.temp] Error 2
make: *** [default] Error 2
cp: cannot stat `./.temp/amifldrv.o_shipped': No such file or directory
ok
 - Program ended normally.
$ ls
afulnx_64  amiwrap.c  amiwrap.h  Makefile

It only produced a few of the files. Not enough to actually build the module; I have no idea why.

At this point I realised it was time to figure out how to get the files to work from. Thankfully there are tools...

strace -o out.txt -s 5000 ./afulnx_64

And then it was a simple matter of looking for the file writes:

open(".temp/amifldrv.c", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6898765000
write(4, "#include <linux/mm.h>\n#include <asm/io.h>\n#include <linux/interrupt.h>\n#include \"amifldrv.h\"\n#include \"amiwrap.h\"\nint amifldrv_ioctl(void);\nint amifldrv_mmap(void);\nstatic int *kmalloc_area = NULL;\nstatic int *kmalloc_ptr = NULL;\nstatic unsigned long kmalloc_len = 0L;\nstatic int major;\nstatic AMIFLDRV_ALLOC kmalloc_drv[128];\nstatic int kcount = 0;\nstatic int amifldrv_init_module(void)\n{\nulArg
[...]

I extracted each file from the trace in out.txt, replacing the escaped endlines, tabs, etc with their real values, and saved them into the current directory.

To make amifldrv.c compile I just had to add two more lines at the top:

#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */

And then the module built against our 4.4.6 kernel and away it went:

$ make
make -C /lib/modules/4.4.6-fm64/build SUBDIRS=/root/afu modules
make[1]: Entering directory `/usr/src/linux-headers-4.4.6-fm64'
  CC [M]  /root/afu/amifldrv.o
  CC [M]  /root/afu/amiwrap.o
  LD [M]  /root/afu/amifldrv_mod.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /root/afu/amifldrv_mod.mod.o
  LD [M]  /root/afu/amifldrv_mod.ko
make[1]: Leaving directory `/usr/src/linux-headers-4.4.6-fm64'
rm -f amifldrv_mod.o
mv amifldrv_mod.ko amifldrv_mod.o

$ ./afulnx_64
+---------------------------------------------------------------------------+
|                 AMI Firmware Update Utility  v5.05.04                     |
|      Copyright (C)2013 American Megatrends Inc. All Rights Reserved.      |
+---------------------------------------------------------------------------+
| Usage: afulnx_64 <ROM File Name> [Option 1] [Option 2]...                 |
|           or                                                              |
|        afulnx_64 <Input or Output File Name> <Command>                    |
|           or                                                              |
|        afulnx_64 <Command>                                                |
| ------------------------------------------------------------------------- |
| Commands:                                                                 |
|         /O - Save current ROM image to file                               |
|         /U - Display ROM File's ROMID                                     |
|         /S - Refer to Options: /S                                         |
|         /D - Verification test of given ROM File without flashing BIOS.   |
|         /A - Refer to Options: /A                                         |
|       /OAD - Refer to Options: /OAD                                       |
| /CLNEVNLOG - Refer to Options: /CLNEVNLOG                                 |
| Options:                                                                  |
|         /Q - Silent execution                                             |
|         /X - Don't Check ROM ID                                           |
|       /CAF - Compare ROM file's data with Systems is different or         |
|              not, if not then cancel related update.                      |
|         /S - Display current system's ROMID                               |
|  /HOLEOUT: - Save specific ROM Hole according to RomHole GUID.            |
|              NewRomHole1.BIN /HOLEOUT:GUID                                |
|        /SP - Preserve Setup setting.                                      |
|         /R - Preserve ALL SMBIOS structure during programming             |
|        /Rn - Preserve SMBIOS type N during programming(n=0-255)           |
|         /B - Program Boot Block                                           |
|         /P - Program Main BIOS                                            |
|         /N - Program NVRAM                                                |
|         /K - Program all non-critical blocks.                             |
|        /Kn - Program n'th non-critical block(n=0-15).                     |
|     /HOLE: - Update specific ROM Hole according to RomHole GUID.          |
|              NewRomHole1.BIN /HOLE:GUID                                   |
|         /L - Program all ROM Holes.                                       |
|        /Ln - Program n'th ROM Hole only(n=0-15).                          |
|      /ECUF - Update EC BIOS when newer version is detected.               |
|         /E - Program Embedded Controller Block                            |
|        /ME - Program ME Entire Firmware Block.                            |
|       /FDR - Flash Flash-Descriptor Region.                               |
|       /PDR - Flash PDR Region.                                            |
|       /MER - Flash Entire ME Region.                                      |
|       /OPR - Flash Operation Region of SPS.                               |
|      /MEUF - Program ME Ignition Firmware Block.                          |
|         /A - Oem Activation file                                          |
|       /OAD - Delete Oem Activation key                                    |
| /CLNEVNLOG - Clear Event Log.                                             |
|   /CAPSULE - Override Secure Flash policy to Capsule                      |
|  /RECOVERY - Override Secure Flash policy to Recovery                     |
|        /EC - Program Embedded Controller Block. (Flash Type)              |
|    /REBOOT - Reboot after programming.                                    |
|  /SHUTDOWN - Shutdown after programming.                                  |
+---------------------------------------------------------------------------+

Looking back at the .bat file shipped with the BIOS, I ran with /D to verify the new ROM, with /O to take a copy of the existing ROM, and then decided that /FDR looked similar enough to /FDT so substituted it in, crossed my fingers, and flashed the BIOS.

(I haven't saved a copy of the output from that, sadly)

End result:

$ dmidecode | head
# dmidecode 2.11
SMBIOS 2.7 present.
138 structures occupying 6792 bytes.
Table at 0x000E9C60.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
  Vendor: American Megatrends Inc.
  Version: 3.2
  Release Date: 01/16/2015

Then I asked the remote hands staff at the datacentre to replace the CPUs, and now we're getting full use of the IO capacity of the excellent Intel DC-3700 SSDs in these machines.

A worthwhile investment

There's so many ways this process could have been easier, but even so, it's been well worth the effort. Things are now running faster for the users that live on those 6 machines, and we've extended the life of perfectly good hardware by a few years. It's a worthwhile investment.