Finally, amd32 is taking shape

When AMD launched its now widely used 64-bit architecture in year 2000, it started marketing it as a significant step forward because of it having longer word and pointer size. One just has to check the advertising material and notice those big “64” being touted as the main improvement.

But it is commonly accepted that most applications don’t need a 64-bit address space for anything. Building them in LP64 model is just a waste of memory due to increase of pointer size. Even though this made most applications lag behind, the new architecture still was an improvement in terms of speed because of the AMD revised ISA, featuring changes like:

  • The program counter register (%rip) can be accessed directly. This means that PIC (position-independent code) can be implemented sanely without doing strange, inefficient gimmicks (on i386, one had to perform a dummy “call” and retrieve the top of the stack inmediately afterwards).
  • Add 8 new general-purpose registers to the first set of 8 registers, most of which were either claimed as implicit argument in some instructions or claimed by the ABI, and as such not really “general purpose”.

To summarize, most of the merit from AMD64 architecture was in fixing some of the insanity of Intel 386 ISA (instruction set architecture), rather than the increased pointer size which was a source of inefficiency most of the time.

Perhaps AMD didn’t evaluate this correctly, or perhaps its marketing side won over technical merit. However, although the new hardware is biased towards LP64 data model, it’s not actually enforced. It was a matter of time until an independent project took over and attempted to fix this, combining the AMD64 ISA with ILP32 data model.

I’ve been reading with much interest in the binutils and gcc mailing lists that such project is beginning to take shape. A port of binutils, GCC, GDB and Linux is already available. Future plans include porting Glibc which will make it possible to build a standard GNU derivative out of this.

Sadly, I don’t have the time to devote to this project myself, but I’ll continue following its progress. I’m looking forward to bringing this speed boost to my machines.

23 Responses to “Finally, amd32 is taking shape”

  1. Winkels Says:

    My partner and i appreciate, lead to I uncovered what precisely I’m seeking. You have concluded my several time prolonged hunt! God Bless you gentleman valley medical human resources. Have a good day time. Bye

  2. Leandro Guimarães Faria Corcete DUTRA Says:

    I am ſimply appalled by ðe ignorance ſhown here. Ðere is, for moſt applications, no ißue of targetting 32 or 64 bits; it is, generally, ſimply a choice in compilation. No amount of ‘pointer compreßion’ will fit a 64 bits pointer in ðe ſame ſpace, wiþ ðe ſame efficiency as native 32 bits code.

    Code efficiency is paramount over development time, because ſtuff is developed once and is run in millions of ſyſtems, often in ſituations much more ſtringent ðan ðoſe faced by developers.

    AMD64 code will never be as efficient, for ſmall applications, becauſe AMD32 uſes all ðe goodies of AMD64, but wiþ leaner memory addreßes. No IA32 here.

  3. Rob Says:

    I think the point about moving to 64-bit is that it’s just going to be a whole lot easier to code to a single target, rather than having to code to 32-bit for some apps (and then have to add plumbing code to enable interoperability with any 64-bit apps it needs to interact with), 64-bit for other apps (and then possibly adding plumbing code to interoperate with remaining 32-bit apps) and both for other apps (meaning 2 lots of testing required).
    The solutions are:
    32-bit only environment – but then no apps can exceed 4GB without memory extensions etc. This will only suit certain mobile and embedded platforms.
    OR
    64-bit only environment. Obviously phasing out 32-bit apps will take a long time, but within 5 years it will be possible to assume all apps are 64-bit and ignore anything 32-bit related. This will make things a lot easier for developers, since only 1 lot of testing will be required. Yes, it seems silly to have a 64-bit version of /bin/ls, but developer efficiency tends to trump performance issues every time, and hybrid platforms, such as x64/x86 will tend to go to one or the other (think of how Windows9x quickly took over from DOS, even though writing apps for DOS would enable faster performance in some cases and compatibility with a larger audience). Why would you want to compile some parts of an OS to 64-bit (e.g. the kernel) and other parts to 32-bit (/bin/ls) when you could simply compile the whole lot to 64-bit to avoid potential complications? Unless you’re building a supercomputer, any performance loss in doing this is trumped by the convenience of a single target platform and not having to decide on the platform for each and every app.

    However, another point regarding performance:
    Yes, 64-bit mode may slow things down, but CPU vendors are fully committed to optimising X64 and are unlikely to backport all improvements to the 32-bit mode of their processors. Over time, X64 will therefore get faster while 32-bit mode on X64 will stagnate.

  4. Damjan Jovanovic Says:

    This seems bad – hacking around the issue by mixing data and instruction models.

    The Java virtual machine is planning to do pointer compression on AMD64 internally:
    http://wikis.sun.com/display/HotSpotInternals/CompressedOops

  5. Ciprian Mustiata Says:

    If I remember correctly: the calling convention use general purpose registers, so is faster (by a small margin, but is great when compiler cannot inline all methods). Also the basic types: int, float, double, and so on, use better constructs but the same size.
    So excluding the pain (that was happening from Win95 times for some applications) for a recompile/a bit more QA, you will get for free as you pointed out a more sane usage of your code from C/C++/whatever to your CPU.
    At the end you argue that uses a bit more memory, and so what? By how much? Most programers lost the skills to pack structures in unions for 10 years or so, so memory will be hard to get back from those programs that we mostly use. Wanna change something? Contribute patches with better data structures in OSS programs you use. So all will win…

  6. Adam Williamson Says:

    console-kit-daemon has a virt size of 2GB on my Fedora 14 boot. I’m wondering if that’s a bug. =)

    Below that I hit Evo at 1.9GB, then Mutter at 1.4GB…

  7. Leandro Guimarães Faria Corcete DUTRA Says:

    I ſee people are miſunderſtandiŋ ðis. To clarify, moſt of a ſyſtem do not need more ðan 4 Gio per proceß, even if any number of ſpecifical applications do. For exemple, PoſtgreSQL may need it, but even ſo it relies on many oðer proceßes ðat will conſume leß memory, and ðus leave more memory available to PoſtgreSQL, if ðey are running as 32 bit applications. Ðus, even if a ſyſtem does have more ðan 4 Gio live memory, and it runs ſome applications as 64 bits proceßes, ðere will probably be a reaſonable performance gain. Incidentally, ðis is one of ðe reaſons why Un*x on RISC ſtill can compete wiþ GNU/Linux.

    • robertmh Says:

      Which Unix and which RISC?

      • Leandro Guimarães Faria Corcete DUTRA Says:

        Solaris on Sparc, HP-UX on PA-Risc (or Itanic) and AIX on Power, not to mention Irix MIPS (which I believe dead) all run 32 bit applications on 64 bits operatiŋ ſyſtems. Ðe only ſituation where I ſaw real performance gains from runniŋ 64 bits applications was in HP-UX on PA-Risc, when gzip needed to proceß ſeveral files of ſeveral dozens of gigaoctets duriŋ an ever-ſhorteniŋ maintenance window. Oðerwiſe, we would not have cared, only Oracle — and HP-UX — needed to be 64 bits.

  8. Rogério Brito Says:

    Professor Knuth has, for some time, expressed his concerns with this regarding the usage of the caches, in his “A Flame About 64-bit Pointers”, from: .

    If I am not mistaken, some other goodies that came with AMD64 (please, do correct me if I am wrong) is that it can be assumed that every processor also supports at least some of the higher levels of SSE (perhaps SSE2, if memory serves me) and that it is potentially better to use those rather than the i387 (single precision) floating point instructions.

    This is a very good thing for those that have 64-bit capable processors, but can’t or don’t plan on having more than 4GB of memory. I will indeed watch this closely (and I may even try to help with getting this into Debian).

    • Ben Hutchings Says:

      “…it is potentially better to use [SSE2] rather than the i387 (single precision) floating point instructions.”

      In fact, the i387 instructions are not even available in Long Mode, so the decision is very easy.

  9. chrysn Says:

    this seems to be a whole lot of effort for boosting a system that will one day find out it would need 64bit pointers. would it be possible to build binaries that decide which mode to use at startup (without building “universal binaries” that include twice the program code)?
    (i assume switching to 64bit mode while an application is running is quite a no-go — sizeof(something) changing during runtime would at least uncover tons of bugs)

    • robertmh Says:

      I don’t think we’ll see things like /bin/ls requiring 4 GiB of memory in our lifetime. I even have my doubts for most desktop applications, seeing that the tendency is splitting up new processes (dbus, hal and such).

      As for universal binaries, probably possible but I don’t see the point. We’re in the free world! We can rebuild things painlessly.

  10. Leandro Guimarães Faria Corcete DUTRA Says:

    ¿Hopefully we will ſee Debian uſerland baſed on ðis? Or can we have around ðe ſame effect wiþ IA32 libraries?

    • robertmh Says:

      It’s not the same effect. My point is that the improvement in AMD64 is in the new ISA (instruction set architecture) designed by AMD, not in its pointer size (64-bit). So the pointer size can be set back to 32-bit without losing the extra registers or other powerful features.

  11. Np237 Says:

    All of that is nice and all, but the applications that require more horsepower are also the ones that will require more than 4 GiB per process.

    Aren’t you afraid this will be a niche solution?

    • robertmh Says:

      Hi Joss,

      I expect all programs will benefit from a performance boost. Of course, if they require more than 4 GiB of virtual memory this will not apply, but this seems like an exception. The vast majority of programs are very far from reaching this limit.

      • Np237 Says:

        Today, there are several processes, that, by mmapping large memory areas, can exceed 1 GiB of virtual memory (that’s not real memory of course): for example Xorg, evolution, browsers, openoffice… You can expect, in a few years (the time needed to make a new architecture emerge), that some of them will require 4 GiB. What can you do at that time?

        But really, most applications are not CPU-bound today, they are IO-bound. The applications that really are CPU-bound are high-performance computing applications and maybe some graphics applications. Both categories already require more of 4 GiB per process, due to their manipulation of very large data sets.

        So in the end, which users and which applications are you targeting? It looks to me that 32-bit address registers will soon be a memory of the past, just like 32-bit IP addresses.

      • Rogério Brito Says:

        Well, the prospect of (almost) halving the amount of data that needs to be transferred between many layers of memory (effectively almost doubling the bandwidth, then), and the better utilization of the caches is attractive (and also of the main memory)…

      • Np237 Says:

        I think you are mistaken here. The amd32 architecture does not double the available bandwidth, it only does so when you transfer addresses. And while addresses and pointers are a noticeable portion of the used bandwidth, they do not represent the majority of the transferred data.

      • robertmh Says:

        Saying that in just a few years desktop apps will begin to require 64-bit mode, leaving out all IA32 users in the cold… well, I think it’s a bold statement.

        In any case, even if it turns out to be true, in server space things are different. Consider a web server with thousands of Apache processes running simultaneously. The penalty can be enourmous. I once worked in a company that preferred IA32 over AMD64 on a modern processor, just because Apache was eating too much memory.

        You could consider also embedded space, but in truth there so many workload models other than the “desktop with bloated apps” one. Think about light desktops. And about development workstations. Why would GCC need 4 GiB if all your C code is appropiately split in many small files? Maybe your linker does, but not the compiler.

Leave a comment