The Intel Legacy Horror Show

Most people would agree that the x86 design is full of legacy junk. But to truly understand this, I think one has to dive in and see for himself. I’d like to talk about my little journey of discovery, in which I learnt the horrors of i8086 legacy.

Roughly three weeks ago, I decided it would be a nice experiment to pick GRUB 2 and make an i386 firmware out of it. GRUB can already run as a standalone bootloader and be part of your firmware when you combine it with coreboot (which initializes the motherboard), but I wanted to have an easy way to test this standalone mode in QEMU. The result (which, btw, is packaged in Debian as grub-firmware-qemu) behaves in exactly the same way a coreboot/GRUB would (except, of course, that it will only work in QEMU).

Initially I thought this would be piece of cake. In QEMU there’s no motherboard to initialize, so basically the steps would be:

– Process the VGA rom with a far call.
– Switch to protected (i386) mode.
– Done! Jump to grub_main() and start as usual.

Hah! So far from reality. First of all, we start with code segment 0xf000, offset 0xfff0, which corresponds to virtual address 0xffff0. Our ROM is I/O mapped in the 0xf0000-0x100000 range. So we’re at exactly 16 bytes before the end of our code. With no room for anything, all we can do is jump.

Not so bad, right? Let’s jump to the beginning of our whole ROM image, and put the initialization code there?

No way. The 0xf0000-0x100000 range in which we’re mapped is just 64 kiB in size, and our image might be bigger (we generate it dynamically with grub-mkimage, and can even include an embedded filesystem). Only the high 64 kiB are mapped there. The rest of our code is near the top of virtual memory, which we can’t access yet because we’re still in i8086 mode (and 640 kiB are enough for everybody, remember?).

I opted for creating a small image with entry code, boot.img, using a hardcoded size (512 bytes). This image will later be picked by grub-mkimage and allocated at the end of our ROM. So we do a relative jump to the beginning of this image:

. = GRUB_BOOT_MACHINE_SIZE – 16
jmp _start
. = GRUB_BOOT_MACHINE_SIZE

and proceed with (finally!) processing the VGA rom:

/* Process VGA rom. */
call $0xc000, $0x3

and switching to 32-bit i386 mode:

/* Transition to protected mode. We use pushl to force generation
of a flat return address. */
pushl $1f
DATA32 jmp real_to_prot
.code32
1:

But before we leave boot.img, we need to figure out where’s the rest of our code. It’s not relative to our current location because, ugh, the beginning of our ROM was truncated.

We know it’s mapped at the top of memory, and for the sake of simplicity (which was greatly missed in this experience), its 32-bit entry point is at the beginning of it. So we only need to substract the ROM size to the 4 GiB barrier. But all this was already known by grub-mkimage when generating our ROM. And it was kind enough to embed this address in a variable:

movl grub_core_entry_addr, %edx
jmp *%edx

Problem is, our toolchain puts the BSS right after our code, which ends really close to the 4 GiB limit. It might not even fit in memory! There’s a chance that it might do, depending on the size of our module selection (GRUB modules are placed right after the main body of code), but no garantee about it! Isn’t the top of memory a practical location?

So let’s relocate elsewhere. Recipe for relocation: current location, destination address, size. Our destination address is somewhat arbitrary, we just pick whatever we used at link time. We’ve known our size since grub-mkimage generating this ROM, so we arranged to have it embedded in a variable, like we did for boot.img:

VARIABLE(grub_kernel_image_size)
.long 0

Whoops, too bad, we can’t even read it, because… memory access is always absolute, and we don’t know its absolute location, so we need to make this position-independant in some way. Fortunately, we know that ROM size is a multiple of 64 kiB, so we obtain %eip and round it:

/* Relocate to low memory. First we figure out our location.
We will derive the rom start address from it. */
call 1f
1: popl %esi

/* Rom size is a multiple of 64 kiB. With this we get the
value of `grub_core_entry_addr’ in %esi. */
xorw %si, %si

At last! We can read grub_kernel_image_size:

/* … which allows us to access `grub_kernel_image_size’
before relocation. */
movl (grub_kernel_image_size – _start)(%esi), %ecx

and then proceed to relocate,

movl $_start, %edi
cld
rep
movsb
ljmp $GRUB_MEMORY_MACHINE_PROT_MODE_CSEG, $1f
1:

zero the BSS, and jump to grub_main():

/*
* Call the start of main body of C code.
*/
call EXT_C(grub_main)

the rest is business as usual.

So, was it so hard to just map the damn thing at a fixed address, say, 0xf0000, without truncating it or using weird memory locations, and use this same address as entry point?

I think I learnt my lesson: never underestimate what 30 years of legacy constraints can do to your sanity. Well, for what is worth, it was a nice learning experience, with a byproduct you might find useful and/or interesting yourself.

4 Responses to “The Intel Legacy Horror Show”

  1. Greg-Matienzo Says:

    Great idea, but will this work over the long run?

    • robertmh Says:

      My code is merged in official GRUB, and maintained. I guess it’ll continue to work for many years to come. Is that what you meant?

  2. robertmh Says:

    This %cs:%rip values correspond to the firmware entry point, the very first code location that is executed. I don’t think my code had the chance to screw up yet :-)

    I suggest you look at what “reason 65535” means. You might also want to try if bochsbios works in this setup.

  3. Jim Says:

    grub-firmware-qemu sounds cool, but it unfortunately doesn’t work with kvm (qemu-kvm-0.10.5 + 2.6.30-1-amd64 modules)

    kvm_run: failed entry, reason 65535
    rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000623
    rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000
    r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000
    r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
    rip 000000000000fff0 rflags 00000002
    cs f000 (ffff0000/0000ffff p 1 dpl 0 db 0 s 1 type a l 0 g 0 avl 0)
    ds 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 0 avl 0)
    es 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 0 avl 0)
    ss 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 1 type 2 l 0 g 0 avl 0)
    fs 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 0 avl 0)
    gs 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 1 type 3 l 0 g 0 avl 0)
    tr 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
    ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
    gdt 0/ffff
    idt 0/ffff
    cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
    kvm_run returned -8

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: