textfiles

Various textfiles on technical topics
git clone https://git.bracken.jp/textfiles.git
Log | Files | Refs

teensy_elf_executables_for_linux.md (39927B)


      1 A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
      2 ========================================================================
      3 
      4 (or, "Size _Is_ Everything")
      5 
      6 Original: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
      7 
      8 Author: Brian Raiter <breadbox@muppetlabs.com>
      9 
     10 > She studied it carefully for about 15 minutes. Finally, she spoke.
     11 > "There's something written on here," she said, frowning, "but it's
     12 > really teensy."
     13 >
     14 > -- [Dave Barry, "The Columnist's Caper"]
     15 
     16 If you're a programmer who's become fed up with software bloat, then may
     17 you find herein the perfect antidote.
     18 
     19 This document explores methods for squeezing excess bytes out of simple
     20 programs. (Of course, the more practical purpose of this document is to
     21 describe a few of the inner workings of the ELF file format and the
     22 Linux operating system. But hopefully you can also learn something about
     23 how to make really teensy ELF executables in the process.)
     24 
     25 Please note that the information and examples given here are, for the
     26 most part, specific to ELF executables on a Linux platform running under
     27 an Intel-386 architecture. I imagine that a good bit of the information
     28 is applicable to other ELF-based Unices, but my experiences with such
     29 are too limited for me to say with certainty.
     30 
     31 Please also note that if you aren't a little bit familiar with assembly
     32 code, you may find parts of this document sort of hard to follow. (The
     33 assembly code that appears in this document is written using Nasm; see
     34 http://www.nasm.us/.
     35 
     36 ------------------------------------------------------------------------
     37 
     38 In order to start, we need a program. Almost any program will do, but
     39 the simpler the program the better, since we're more interested in how
     40 small we can make the executable than what the program does.
     41 
     42 Let's take an incredibly simple program, one that does nothing but
     43 return a number back to the operating system. Why not? After all, Unix
     44 already comes with no less than two such programs: true and false. Since
     45 0 and 1 are already taken, we'll use the number 42.
     46 
     47 So, here is our first version:
     48 
     49     /* tiny.c */
     50     int main(void) { return 42; }
     51 
     52 which we can compile and test like so:
     53 
     54     $ gcc -Wall tiny.c
     55     $ ./a.out ; echo $?
     56     42
     57 
     58 So. How big is it? Well, on my machine, I get:
     59 
     60     $ wc -c a.out
     61        3998 a.out
     62 
     63 (Yours will probably differ some.) Admittedly, that's pretty small by
     64 today's standards, but it's almost certainly bigger than it needs to be.
     65 
     66 The obvious first step is to strip the executable:
     67 
     68     $ gcc -Wall -s tiny.c
     69     $ ./a.out ; echo $?
     70     42
     71     $ wc -c a.out
     72        2632 a.out
     73 
     74 That's certainly an improvement. For the next step, how about
     75 optimizing?
     76 
     77     $ gcc -Wall -s -O3 tiny.c
     78     $ wc -c a.out
     79        2616 a.out
     80 
     81 That also helped, but only just. Which makes sense: there's hardly
     82 anything there to optimize.
     83 
     84 It seems unlikely that there's much else we can do to shrink a
     85 one-statement C program. We're going to have to leave C behind, and use
     86 assembler instead.  Hopefully, this will cut out all the extra overhead
     87 that C programs automatically incur.
     88 
     89 So, on to our second version. All we need to do is return 42 from
     90 main(). In assembly language, this means that the function should set
     91 the accumulator, eax, to 42, and then return:
     92 
     93     ; tiny.asm
     94     BITS 32
     95     GLOBAL main
     96     SECTION .text
     97     main:
     98                   mov     eax, 42
     99                   ret
    100 
    101 We can then build and test like so:
    102 
    103     $ nasm -f elf tiny.asm
    104     $ gcc -Wall -s tiny.o
    105     $ ./a.out ; echo $?
    106     42
    107 
    108 (Hey, who says assembly code is difficult?) And now how big is it?
    109 
    110     $ wc -c a.out
    111        2604 a.out
    112 
    113 Looks like we shaved off a measly twelve bytes. So much for all the
    114 extra overhead that C automatically incurs, eh?
    115 
    116 Well, the problem is that we are still incurring a lot of overhead by
    117 using the main() interface. The linker is still adding an interface to
    118 the OS for us, and it is that interface that actually calls main(). So
    119 how do we get around that if we don't need it?
    120 
    121 The actual entry point that the linker uses by default is the symbol
    122 with the name `_start`. When we link with gcc, it automatically includes
    123 a `_start` routine, one that sets up argc and argv, among other things,
    124 and then calls `main()`.
    125 
    126 So, let's see if we can bypass this, and define our own `_start`
    127 routine:
    128 
    129     ; tiny.asm
    130     BITS 32
    131     GLOBAL _start
    132     SECTION .text
    133     _start:
    134                   mov     eax, 42
    135                   ret
    136 
    137 Will gcc do what we want?
    138 
    139     $ nasm -f elf tiny.asm
    140     $ gcc -Wall -s tiny.o
    141     tiny.o(.text+0x0): multiple definition of `_start'
    142     /usr/lib/crt1.o(.text+0x0): first defined here
    143     /usr/lib/crt1.o(.text+0x36): undefined reference to `main'
    144 
    145 No. Well, actually, yes it will, but first we need to learn how to ask
    146 for what we want.
    147 
    148 It so happens that gcc recognizes an option called `-nostartfiles`.
    149 From the gcc info pages:
    150 
    151     -nostartfiles
    152               Do not use the standard system startup files when
    153               linking. The standard libraries are used normally.
    154 
    155 Aha! Now let's see what we can do:
    156 
    157     $ nasm -f elf tiny.asm
    158     $ gcc -Wall -s -nostartfiles tiny.o
    159     $ ./a.out ; echo $?
    160     Segmentation fault
    161     139
    162 
    163 Well, gcc didn't complain, but the program doesn't work. What went
    164 wrong?
    165 
    166 What went wrong is that we treated `_start` as if it were a C function,
    167 and tried to return from it. In reality, it's not a function at all.
    168 It's just a symbol in the object file which the linker uses to locate
    169 the program's entry point. When our program is invoked, it's invoked
    170 directly. If we were to look, we would see that the value on the top of
    171 the stack was the number 1, which is certainly very un-address-like. In
    172 fact, what is on the stack is our program's argc value. After this comes
    173 the elements of the argv array, including the terminating NULL element,
    174 followed by the elements of envp. And that's all.  There is no return
    175 address on the stack.
    176 
    177 So, how does `_start` ever exit? Well, it calls the `exit()` function!
    178 That's what it's there for, after all.
    179 
    180 Actually, I lied. What it really does is call the `_exit()` function.
    181 (Notice the leading underscore.) exit() is required to finish up some
    182 tasks on behalf of the process, but those tasks will never have been
    183 started, because we're bypassing the library's startup code. So we need
    184 to bypass the library's shutdown code as well, and go directly to the
    185 operating system's shutdown processing.
    186 
    187 So, let's try this again. We're going to call `_exit()`, which is a
    188 function that takes a single integer argument. So all we need to do is
    189 push the number onto the stack and call the function. (We also need to
    190 declare `_exit()` as external.) Here's our assembly:
    191 
    192     ; tiny.asm
    193     BITS 32
    194     EXTERN _exit
    195     GLOBAL _start
    196     SECTION .text
    197     _start:
    198                   push    dword 42
    199                   call    _exit
    200 
    201 And we build and test as before:
    202 
    203     $ nasm -f elf tiny.asm
    204     $ gcc -Wall -s -nostartfiles tiny.o
    205     $ ./a.out ; echo $?
    206     42
    207 
    208 Success at last! And now how big is it?
    209 
    210     $ wc -c a.out
    211        1340 a.out
    212 
    213 Almost half the size! Not bad. Not bad at all. Hmmm ... so what other
    214 interesting obscure options does gcc have?
    215 
    216 Well, this one, appearing immediately after -nostartfiles in the
    217 documentation, is certainly eye-catching:
    218 
    219     -nostdlib
    220              Don't use the standard system libraries and startup
    221              files when linking. Only the files you specify will
    222              be passed to the linker.
    223 
    224 That's gotta be worth investigating:
    225 
    226     $ gcc -Wall -s -nostdlib tiny.o
    227     tiny.o(.text+0x6): undefined reference to `_exit'
    228 
    229 Oops. That's right ... `_exit()` is, after all, a library function.  It
    230 has to be filled in from somewhere.
    231 
    232 Okay. But surely, we don't need libc's help just to end a program, do
    233 we?
    234 
    235 No, we don't. If we're willing to leave behind all pretenses of
    236 portability, we can make our program exit without having to link with
    237 anything else. First, though, we need to know how to make a system call
    238 under Linux.
    239 
    240 ------------------------------------------------------------------------
    241 
    242 Linux, like most operating systems, provides basic necessities to the
    243 programs it hosts via system calls. This includes things like opening a
    244 file, reading and writing to file handles -- and, of course, shutting
    245 down a process.
    246 
    247 The Linux system call interface is a single instruction: `int 0x80`.
    248 All system calls are done via this interrupt. To make a system call, eax
    249 should contain a number that indicates which system call is being
    250 invoked, and other registers are used to hold the arguments, if any.  If
    251 the system call takes one argument, it will be in ebx; a system call
    252 with two arguments will use ebx and ecx. Likewise, edx, esi, and edi are
    253 used if a third, fourth, or fifth argument is required, respectively.
    254 Upon return from a system call, eax will contain the return value. If an
    255 error occurs, eax will contain a negative value, with the absolute value
    256 indicating the error.
    257 
    258 The numbers for the different system calls are listed in
    259 `/usr/include/asm/unistd.h`. A quick peek will tell us that the exit
    260 system call is assigned the number 1. Like the C function, it takes one
    261 argument, the value to return to the parent process, and so this will go
    262 into ebx.
    263 
    264 We now know all we need to know to create the next version of our
    265 program, one that won't need assistance from any external functions to
    266 work:
    267 
    268     ; tiny.asm
    269     BITS 32
    270     GLOBAL _start
    271     SECTION .text
    272     _start:
    273                   mov     eax, 1
    274                   mov     ebx, 42  
    275                   int     0x80
    276 
    277 Here we go:
    278 
    279     $ nasm -f elf tiny.asm
    280     $ gcc -Wall -s -nostdlib tiny.o
    281     $ ./a.out ; echo $?
    282     42
    283 
    284 Ta-da! And the size?
    285 
    286     $ wc -c a.out
    287         372 a.out
    288 
    289 Now _that's_ tiny! Almost a fourth the size of the previous version!
    290 
    291 So ... can we do anything else to make it even smaller?
    292 
    293 How about using shorter instructions?
    294 
    295 If we generate a list file for the assembly code, we'll find the
    296 following:
    297 
    298     00000000 B801000000        mov        eax, 1
    299     00000005 BB2A000000        mov        ebx, 42
    300     0000000A CD80              int        0x80
    301 
    302 Well, gee, we don't need to initialize all of ebx, since the operating
    303 system is only going to use the lowest byte. Setting bl alone will be
    304 sufficient, and will take two bytes instead of five.
    305 
    306 We can also set eax to one by xor'ing it to zero and then using a
    307 one-byte increment instruction; this will save two more bytes.
    308 
    309     00000000 31C0              xor        eax, eax
    310     00000002 40                inc        eax
    311     00000003 B32A              mov        bl, 42
    312     00000005 CD80              int        0x80
    313 
    314 I think it's pretty safe to say that we're not going to make this
    315 program any smaller than that.
    316 
    317 As an aside, we might as well stop using gcc to link our executable,
    318 seeing as we're not using any of its added functionality, and just call
    319 the linker, ld, ourselves:
    320 
    321     $ nasm -f elf tiny.asm
    322     $ ld -s tiny.o
    323     $ ./a.out ; echo $?
    324     42
    325     $ wc -c a.out
    326         368 a.out
    327 
    328 Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did,
    329 but alignment considerations within the ELF file caused it to require an
    330 extra byte of padding.)
    331 
    332 So ... have we reached the end? Is this as small as we can go?
    333 
    334 Well, hm. Our program is now seven bytes long. Do ELF files really
    335 require 361 bytes of overhead? What's in this file, anyway?
    336 
    337 We can peek into the contents of the file using objdump:
    338 
    339     $ objdump -x a.out | less
    340 
    341 The output may look like gibberish, but right now let's just focus on
    342 the list of sections:
    343 
    344     Sections:
    345     Idx Name          Size      VMA       LMA       File off  Algn
    346       0 .text         00000007  08048080  08048080  00000080  2**4
    347                       CONTENTS, ALLOC, LOAD, READONLY, CODE
    348       1 .comment      0000001c  00000000  00000000  00000087  2**0
    349                       CONTENTS, READONLY
    350 
    351 The complete .text section is listed as being seven bytes long, just as
    352 we specified. So it seems safe to conclude that we now have complete
    353 control of the machine-language content of our program.
    354 
    355 But then there's this other section named ".comment". Who ordered
    356 _that_? And it's 28 bytes long, even! We may not be sure what this
    357 .comment section is, but it seems a good bet that it isn't a necessary
    358 feature....
    359 
    360 The .comment section is listed as being located at file offset 00000087
    361 (hexadecimal). If we use a hexdump program to look at that area of the
    362 file, we will see:
    363 
    364     00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477  1.@.*...The Netw
    365     00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E  ide Assembler 0.
    366     000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472  98...symtab..str
    367 
    368 Well, well, well. Who'd've thought that Nasm would undermine our quest
    369 like this? Maybe we should switch to using gas, AT&T syntax
    370 notwithstanding....
    371 
    372 Alas, if we do:
    373 
    374     ; tiny.s
    375     .globl _start
    376     .text
    377     _start:
    378                   xorl    %eax, %eax
    379                   incl    %eax
    380                   movb    $42, %bl
    381                   int     $0x80
    382 
    383 ... we will find:
    384 
    385     $ gcc -s -nostdlib tiny.s
    386     $ ./a.out ; echo $?
    387     42
    388     $ wc -c a.out
    389         368 a.out
    390 
    391 ... no difference!
    392 
    393 Well, actually there is some difference. Turning once again to objdump,
    394 we see:
    395 
    396     Sections:
    397     Idx Name          Size      VMA       LMA       File off  Algn
    398       0 .text         00000007  08048074  08048074  00000074  2**2
    399                       CONTENTS, ALLOC, LOAD, READONLY, CODE
    400       1 .data         00000000  0804907c  0804907c  0000007c  2**2
    401                       CONTENTS, ALLOC, LOAD, DATA
    402       2 .bss          00000000  0804907c  0804907c  0000007c  2**2
    403                       ALLOC
    404 
    405 No comment section, but now we have two useless sections for storing our
    406 nonexistent data. And even though these sections are zero bytes long,
    407 they incur overhead, bringing our file size up for no good reason.
    408 
    409 Okay, so just what is all this overhead, and how do we get rid of it?
    410 
    411 Well, to answer these questions, we must begin diving into some real
    412 wizardry.  We need to understand the ELF format.
    413 
    414 ------------------------------------------------------------------------
    415 
    416 The canonical document describing the ELF format for Intel-386
    417 architectures can be found at http://refspecs.linuxbase.org/elf/elf.pdf.
    418 (You can also find a flat-text version of version 1.0 of the standard at
    419 http://www.muppetlabs.com/~breadbox/software/ELF.txt.) This
    420 specification covers a lot of territory, so if you'd prefer to not read
    421 the whole thing yourself, I'll understand. Basically, here's what we
    422 need to know:
    423 
    424 Every ELF file begins with a structure called the ELF header. This
    425 structure is 52 bytes long, and contains several pieces of information
    426 that describe the contents of the file. For example, the first sixteen
    427 bytes contain an "identifier", which includes the file's magic-number
    428 signature (7F 45 4C 46), and some one-byte flags indicating that the
    429 contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other
    430 fields in the ELF header contain information such as: the target
    431 architecture; whether the ELF file is an executable, an object file, or
    432 a shared-object library; the program's starting address; and the
    433 locations within the file of the program header table and the section
    434 header table.
    435 
    436 These two tables can appear anywhere in the file, but typically the
    437 former appears immediately following the ELF header, and the latter
    438 appears at or near the end of the file. The two tables serve similar
    439 purposes, in that they identify the component parts of the file.
    440 However, the section header table focuses more on identifying where the
    441 various parts of the program are within the file, while the program
    442 header table describes where and how these parts are to be loaded into
    443 memory. In brief, the section header table is for use by the compiler
    444 and linker, while the program header table is for use by the program
    445 loader. The program header table is optional for object files, and in
    446 practice is never present. Likewise, the section header table is
    447 optional for executables -- but is almost _always_ present!
    448 
    449 So, this is the answer to our first question. A fair piece of the
    450 overhead in our program is a completely unnecessary section header
    451 table, and maybe some equally useless sections that don't contribute to
    452 our program's memory image.
    453 
    454 So, we turn to our second question: how do we go about getting rid of
    455 all that?
    456 
    457 Alas, we're on our own here. None of the standard tools will deign to
    458 make an executable without a section header table of some kind. If we
    459 want such a thing, we'll have to do it ourselves.
    460 
    461 This doesn't quite mean that we have to pull out a binary editor and
    462 code the hexadecimal values by hand, though. Good old Nasm has a flat
    463 binary output format, which will serve us well. All we need now is the
    464 image of an empty ELF executable, which we can fill in with our program.
    465 Our program, and nothing else.
    466 
    467 We can look at the ELF specification, and /usr/include/linux/elf.h, and
    468 executables created by the standard tools, to figure out what our empty
    469 ELF executable should look like. But, if you're the impatient type, you
    470 can just use the one I've supplied here:
    471 
    472     BITS 32
    473     
    474                   org     0x08048000
    475     
    476     ehdr:                                               ; Elf32_Ehdr
    477                   db      0x7F, "ELF", 1, 1, 1, 0       ;   e_ident
    478           times 8 db      0
    479                   dw      2                             ;   e_type
    480                   dw      3                             ;   e_machine
    481                   dd      1                             ;   e_version
    482                   dd      _start                        ;   e_entry
    483                   dd      phdr - $$                     ;   e_phoff
    484                   dd      0                             ;   e_shoff
    485                   dd      0                             ;   e_flags
    486                   dw      ehdrsize                      ;   e_ehsize
    487                   dw      phdrsize                      ;   e_phentsize
    488                   dw      1                             ;   e_phnum
    489                   dw      0                             ;   e_shentsize
    490                   dw      0                             ;   e_shnum
    491                   dw      0                             ;   e_shstrndx
    492     
    493     ehdrsize      equ     $ - ehdr
    494     
    495     phdr:                                               ; Elf32_Phdr
    496                   dd      1                             ;   p_type
    497                   dd      0                             ;   p_offset
    498                   dd      $$                            ;   p_vaddr
    499                   dd      $$                            ;   p_paddr
    500                   dd      filesize                      ;   p_filesz
    501                   dd      filesize                      ;   p_memsz
    502                   dd      5                             ;   p_flags
    503                   dd      0x1000                        ;   p_align
    504     
    505     phdrsize      equ     $ - phdr
    506     
    507     _start:
    508     
    509     ; your program here
    510     
    511     filesize      equ     $ - $$
    512 
    513 This image contains an ELF header, identifying the file as an Intel 386
    514 executable, with no section header table and a program header table
    515 containing one entry. Said entry instructs the program loader to load
    516 the entire file into memory (it's normal behavior for a program to
    517 include its ELF header and program header table in its memory image)
    518 starting at memory address 0x08048000 (which is the default address for
    519 executables to load), and to begin executing the code at `_start`, which
    520 appears immediately after the program header table. No .data segment, no
    521 .bss segment, no commentary -- nothing but the bare necessities.
    522 
    523 So, let's add in our little program:
    524 
    525     ; tiny.asm
    526                   org     0x08048000
    527     
    528     ;
    529     ; (as above)
    530     ;
    531   
    532     _start:
    533                   mov     bl, 42
    534                   xor     eax, eax
    535                   inc     eax
    536                   int     0x80
    537     
    538     filesize      equ     $ - $$
    539 
    540 and try it out:
    541 
    542     $ nasm -f bin -o a.out tiny.asm
    543     $ chmod +x a.out
    544     $ ./a.out ; echo $?
    545     42
    546 
    547 We have just created an executable completely from scratch. How about
    548 that? And now, take a look at its size:
    549 
    550     $ wc -c a.out
    551          91 a.out
    552 
    553 _Ninety-one bytes_. Less than one-fourth the size of our previous
    554 attempt, and less than one-_fortieth_ the size of our first!
    555 
    556 What's more, this time we can account for every last byte. We know
    557 exactly what's in the executable, and why it needs to be there. This is,
    558 finally, the limit. We can't get any smaller than this.
    559 
    560 Or can we?
    561 
    562 ------------------------------------------------------------------------
    563 
    564 Well, if you actually stopped to read the ELF specification, you might
    565 have noticed a couple of facts. 1) The different parts of an ELF file
    566 are permitted to be located anywhere (except the ELF header, which must
    567 be at the top of the file), and they can even overlap each other.  2)
    568 Some of the fields in the headers aren't actually used.
    569 
    570 In particular, I'm thinking of that string of zeros at the end of the
    571 16-byte identification field. They are pure padding, to make room for
    572 future expansion of the ELF standard. So the OS shouldn't care at all
    573 what's in there. And we're already loading everything into memory
    574 anyway, and our program is only seven bytes long....
    575 
    576 Can we put our code inside the ELF header itself?
    577 
    578 Why not?
    579 
    580     ; tiny.asm
    581     
    582     BITS 32
    583     
    584                   org     0x08048000
    585     
    586     ehdr:                                               ; Elf32_Ehdr
    587                   db      0x7F, "ELF"                   ;   e_ident
    588                   db      1, 1, 1, 0, 0
    589     _start:       mov     bl, 42
    590                   xor     eax, eax
    591                   inc     eax
    592                   int     0x80
    593                   dw      2                             ;   e_type
    594                   dw      3                             ;   e_machine
    595                   dd      1                             ;   e_version
    596                   dd      _start                        ;   e_entry
    597                   dd      phdr - $$                     ;   e_phoff
    598                   dd      0                             ;   e_shoff
    599                   dd      0                             ;   e_flags
    600                   dw      ehdrsize                      ;   e_ehsize
    601                   dw      phdrsize                      ;   e_phentsize
    602                   dw      1                             ;   e_phnum
    603                   dw      0                             ;   e_shentsize
    604                   dw      0                             ;   e_shnum
    605                   dw      0                             ;   e_shstrndx
    606     
    607     ehdrsize      equ     $ - ehdr
    608     
    609     phdr:                                               ; Elf32_Phdr
    610                   dd      1                             ;   p_type
    611                   dd      0                             ;   p_offset
    612                   dd      $$                            ;   p_vaddr
    613                   dd      $$                            ;   p_paddr
    614                   dd      filesize                      ;   p_filesz
    615                   dd      filesize                      ;   p_memsz
    616                   dd      5                             ;   p_flags
    617                   dd      0x1000                        ;   p_align
    618     
    619     phdrsize      equ     $ - phdr
    620     
    621     filesize      equ     $ - $$
    622 
    623 After all, bytes are bytes!
    624 
    625     $ nasm -f bin -o a.out tiny.asm
    626     $ chmod +x a.out
    627     $ ./a.out ; echo $?
    628     42
    629     $ wc -c a.out
    630          84 a.out
    631 
    632 Not bad, eh?
    633 
    634 Now we've really gone as low as we can go. Our file is exactly as long
    635 as one ELF header and one program header table entry, both of which we
    636 absolutely require in order to get loaded into memory and run. So
    637 there's nothing left to reduce now!
    638 
    639 Except ...
    640 
    641 Well, what if we could do the same thing to the program header table
    642 that we just did to the program? Have it overlap with the ELF header,
    643 that is. Is it possible?
    644 
    645 It is indeed. Take a look at our program. Note that the last eight bytes
    646 in the ELF header bear a certain kind of resemblence to the first eight
    647 bytes in the program header table. A certain kind of resemblence that
    648 might be described as "identical".
    649 
    650 So ...
    651 
    652     ; tiny.asm
    653     
    654     BITS 32
    655     
    656                   org     0x08048000
    657     
    658     ehdr:
    659                   db      0x7F, "ELF"         ; e_ident
    660                   db      1, 1, 1, 0, 0
    661     _start:       mov     bl, 42
    662                   xor     eax, eax
    663                   inc     eax
    664                   int     0x80
    665                   dw      2                   ; e_type
    666                   dw      3                   ; e_machine
    667                   dd      1                   ; e_version
    668                   dd      _start              ; e_entry
    669                   dd      phdr - $$           ; e_phoff
    670                   dd      0                   ; e_shoff
    671                   dd      0                   ; e_flags
    672                   dw      ehdrsize            ; e_ehsize
    673                   dw      phdrsize            ; e_phentsize
    674     phdr:         dd      1                   ; e_phnum       ; p_type
    675                                               ; e_shentsize
    676                   dd      0                   ; e_shnum       ; p_offset
    677                                               ; e_shstrndx
    678     ehdrsize      equ     $ - ehdr
    679                   dd      $$                                  ; p_vaddr
    680                   dd      $$                                  ; p_paddr
    681                   dd      filesize                            ; p_filesz
    682                   dd      filesize                            ; p_memsz
    683                   dd      5                                   ; p_flags
    684                   dd      0x1000                              ; p_align
    685     phdrsize      equ     $ - phdr
    686     
    687     filesize      equ     $ - $$
    688 
    689 And sure enough, Linux doesn't mind our parsimony one bit:
    690 
    691     $ nasm -f bin -o a.out tiny.asm
    692     $ chmod +x a.out
    693     $ ./a.out ; echo $?
    694     42
    695     $ wc -c a.out
    696          76 a.out
    697 
    698 Now we've _really_ gone as low as we can go. There's no way to overlap
    699 the two structures any more than this. The bytes simply don't match up.
    700 This is the end of the line!
    701 
    702 Unless, that is, we could change the contents of the structures to make
    703 them match even further....
    704 
    705 How many of these fields is Linux actually looking at, anyway? For
    706 example, does Linux actually check to see if the `e_machine` field
    707 contains 3 (indicating an Intel 386 target), or is it just assuming that
    708 it does?
    709 
    710 As a matter of fact, in that case it does. But a surprising number of
    711 other fields are being quietly ignored.
    712 
    713 So: Here's what is and isn't essential in the ELF header. The first four
    714 bytes have to contain the magic number, or else Linux won't touch it.
    715 The other three bytes in the `e_ident` field are not checked, however,
    716 which means we have no less than twelve contiguous bytes we can set to
    717 anything at all. `e_type` has to be set to 2, to indicate an executable,
    718 and `e_machine` has to be 3, as just noted. `e_version` is, like the
    719 version number inside `e_ident`, completely ignored. (Which is sort of
    720 understandable, seeing as currently there's only one version of the ELF
    721 standard.) `e_entry` naturally has to be valid, since it points to the
    722 start of the program. And clearly, `e_phoff` needs to contain the
    723 correct offset of the program header table in the file, and `e_phnum`
    724 needs to contain the right number of entries in said table. `e_flags`,
    725 however, is documented as being currently unused for Intel, so it should
    726 be free for us to reuse. `e_ehsize` is supposed to be used to verify
    727 that the ELF header has the expected size, but Linux pays it no mind.
    728 `e_phentsize` is likewise for validating the size of the program header
    729 table entries.  This one was unchecked in older kernels, but now it
    730 needs to be set correctly. Everything else in the ELF header is about
    731 the section header table, which doesn't come into play with executable
    732 files.
    733 
    734 And now how about the program header table entry? Well, `p_type` has to
    735 contain 1, to mark it as a loadable segment. `p_offset` really needs to
    736 have the correct file offset to start loading. Likewise, `p_vaddr` needs
    737 to contain the proper load address. Note, however, that we're not
    738 required to load at 0x08048000. Almost any address can be used as long
    739 as it's above 0x00000000, below 0x80000000, and page-aligned. The
    740 `p_paddr` field is documented as being ignored, so that's guaranteed to
    741 be free. `p_filesz` indicates how many bytes to load out of the file
    742 into memory, and `p_memsz` indicates how large the memory segment needs
    743 to be, so these numbers ought to be relatively sane. `p_flags` indicates
    744 what permissions to give the memory segment. It needs to be readable
    745 (4), or it won't be usable at all, and it needs to also be executable
    746 (1), or else we can't execute code in it. Other bits can probably be set
    747 as well, but we need to have those at minimum. Finally, `p_align` gives
    748 the alignment requirements for the memory segment. This field is mainly
    749 used when relocating segments containing position-independent code (as
    750 for shared libraries), so for an executable file Linux will ignore
    751 whatever garbage we store here.
    752 
    753 All in all, that's a fair bit of leeway. In particular, a bit of
    754 scrutiny will reveal that most of the necessary fields in the ELF header
    755 are in the first half - the second half is almost completely free for
    756 munging. With this in mind, we can interpose the two structures quite a
    757 bit more than we did previously:
    758 
    759     ; tiny.asm
    760     
    761     BITS 32
    762     
    763                   org     0x00200000
    764     
    765                   db      0x7F, "ELF"         ; e_ident
    766                   db      1, 1, 1, 0, 0
    767     _start:
    768                   mov     bl, 42
    769                   xor     eax, eax
    770                   inc     eax
    771                   int     0x80
    772                   dw      2                   ; e_type
    773                   dw      3                   ; e_machine
    774                   dd      1                   ; e_version
    775                   dd      _start              ; e_entry
    776                   dd      phdr - $$           ; e_phoff
    777     phdr:         dd      1                   ; e_shoff       ; p_type
    778                   dd      0                   ; e_flags       ; p_offset
    779                   dd      $$                  ; e_ehsize      ; p_vaddr
    780                                               ; e_phentsize
    781                   dw      1                   ; e_phnum       ; p_paddr
    782                   dw      0                   ; e_shentsize
    783                   dd      filesize            ; e_shnum       ; p_filesz
    784                                               ; e_shstrndx
    785                   dd      filesize                            ; p_memsz
    786                   dd      5                                   ; p_flags
    787                   dd      0x1000                              ; p_align
    788     
    789     filesize      equ     $ - $$
    790 
    791 As you can (hopefully) see, the first twenty bytes of the program header
    792 table now overlap the last twenty bytes of the ELF header. The two
    793 dovetail quite nicely, actually. There are only two parts of the ELF
    794 header within the overlapped region that matter. The first is the
    795 `e_phnum` field, which just happens to coincide with the `p_paddr`
    796 field, one of the few fields in the program header table which is
    797 definitely ignored. The other is the `e_phentsize` field, which
    798 coincides with the top half of the `p_vaddr` field. These are made to
    799 match up by selecting a non-standard load address for our program, with
    800 a top half equal to 0x0020.
    801 
    802 Now we have _really_ left behind all pretenses of portability ...
    803 
    804     $ nasm -f bin -o a.out tiny.asm
    805     $ chmod +x a.out
    806     $ ./a.out ; echo $?
    807     42
    808     $ wc -c a.out
    809          64 a.out
    810 
    811 ... but it works! And the program is twelve bytes shorter, exactly as
    812 predicted.
    813 
    814 This is where I say that we can't do any better than this, but of
    815 course, we already know that we can -- if we could get the program
    816 header table to reside _completely_ within the ELF header. Can this holy
    817 grail be achieved?
    818 
    819 Well, we can't just move it up another twelve bytes without hitting
    820 hopeless obstacles trying to reconcile several fields in both
    821 structures. The only other possibility would be to have it start
    822 immediately following the first four bytes. This puts the first part of
    823 the program header table comfortably within the `e_ident` area, but
    824 still leaves problems with the rest of it. After some experimenting, it
    825 looks like it isn't going to quite be possible.
    826 
    827 However, it turns out that there are still a couple more fields in the
    828 program header table that we can pervert.
    829 
    830 We noted that `p_memsz` indicates how much memory to allocate for the
    831 memory segment. Obviously it needs to be at least as big as `p_filesz`,
    832 but there wouldn't be any harm if it was larger. Just because we ask for
    833 memory doesn't mean we have to use it, after all.
    834 
    835 Secondly, it turns out that, contrary to all my expectations, the
    836 executable bit can be dropped from the `p_flags` field. It turns out
    837 that the readable and executable bits are redundant: either one will
    838 imply the other.
    839 
    840 So, with these facts in mind, we can reorganize the file into this
    841 little monstrosity:
    842 
    843     ; tiny.asm
    844     
    845     BITS 32
    846     
    847                   org     0x00010000
    848     
    849                   db      0x7F, "ELF"         ; e_ident
    850                   dd      1                                   ; p_type
    851                   dd      0                                   ; p_offset
    852                   dd      $$                                  ; p_vaddr 
    853                   dw      2                   ; e_type        ; p_paddr
    854                   dw      3                   ; e_machine
    855                   dd      _start              ; e_version     ; p_filesz
    856                   dd      _start              ; e_entry       ; p_memsz
    857                   dd      4                   ; e_phoff       ; p_flags
    858     _start:
    859                   mov     bl, 42              ; e_shoff       ; p_align
    860                   xor     eax, eax
    861                   inc     eax                 ; e_flags
    862                   int     0x80
    863                   db      0
    864                   dw      0x34                ; e_ehsize
    865                   dw      0x20                ; e_phentsize
    866                   dw      1                   ; e_phnum
    867                   dw      0                   ; e_shentsize
    868                   dw      0                   ; e_shnum
    869                   dw      0                   ; e_shstrndx
    870     
    871     filesize      equ     $ - $$
    872 
    873 The `p_flags` field has been changed from 5 to 4, as we noted we could
    874 get away with doing. This 4 is also the value of the `e_phoff` field,
    875 which gives the offset into the file for the program header table, which
    876 is exactly where we've located it. The program (remember that?) has been
    877 moved down to lower part of the ELF header, beginning at the `e_shoff`
    878 field and ending inside the `e_flags` field.
    879 
    880 Note that the load address has been changed to a much lower number --
    881 about as low as it can be, in fact. This keeps the value in the
    882 `e_entry` field to a reasonably small number, which is good since it's
    883 also the `p_memsz` number.  (Actually, with virtual memory it hardly
    884 matters -- we could have left it at our original value and it would work
    885 just as well. But there's no harm in being polite.)
    886 
    887 The change to `p_filesz` may require an explanation. Because we aren't
    888 setting the write bit in the `p_flags` field, Linux won't let us define
    889 a `p_memsz` value greater than `p_filesz`, since it can't
    890 zero-initialize those extra bytes if they aren't writeable. Since we
    891 can't change the `p_flags` field without moving the program header table
    892 out of alignment, you might think that the only solution would be to
    893 lower the `p_memsz` value back down to equal `p_filesz` (which would
    894 make it impossible to share it with `e_entry`). However, another
    895 solution exists, namely to increase `p_filesz` to equal `p_memsz`. That
    896 means they're both larger than the real file size -- quite a bit larger,
    897 in fact -- but it absolves the loader from having to write to read-only
    898 memory, which is all it cared about.
    899 
    900 And so ...
    901 
    902     $ nasm -f bin -o a.out tiny.asm
    903     $ chmod +x a.out
    904     $ ./a.out ; echo $?
    905     42
    906     $ wc -c a.out
    907          52 a.out
    908 
    909 ... and so, with both the program header table and the program itself
    910 completely embedded within the ELF header, our executable file is now
    911 exactly as big as the ELF header! No more, no less. And _still_ running
    912 without a single complaint from Linux!
    913 
    914 Now, finally, we have truly and certainly reached the absolute minimum
    915 possible. There can be no question about it, right? After all, we have
    916 to have a complete ELF header (even if it is badly mangled), or else
    917 Linux wouldn't give us the time of day!
    918 
    919 Right?
    920 
    921 Wrong. We have one last dirty trick left.
    922 
    923 It seems to be the case that if the file isn't quite the size of a full
    924 ELF header, Linux will still play ball, and fill out the missing bytes
    925 with zeros. We have no less than seven zeros at the end of our file, and
    926 if we drop them from the file image:
    927 
    928     ; tiny.asm
    929     
    930     BITS 32
    931     
    932                   org     0x00010000
    933     
    934                   db      0x7F, "ELF"         ; e_ident
    935                   dd      1                                   ; p_type
    936                   dd      0                                   ; p_offset
    937                   dd      $$                                  ; p_vaddr 
    938                   dw      2                   ; e_type        ; p_paddr
    939                   dw      3                   ; e_machine
    940                   dd      _start              ; e_version     ; p_filesz
    941                   dd      _start              ; e_entry       ; p_memsz
    942                   dd      4                   ; e_phoff       ; p_flags
    943     _start:
    944                   mov     bl, 42              ; e_shoff       ; p_align
    945                   xor     eax, eax
    946                   inc     eax                 ; e_flags
    947                   int     0x80
    948                   db      0
    949                   dw      0x34                ; e_ehsize
    950                   dw      0x20                ; e_phentsize
    951                   db      1                   ; e_phnum
    952                                               ; e_shentsize
    953                                               ; e_shnum
    954                                               ; e_shstrndx
    955     
    956     filesize      equ     $ - $$
    957 
    958 ... we can, incredibly enough, still produce a working executable:
    959 
    960     $ nasm -f bin -o a.out tiny.asm
    961     $ chmod +x a.out
    962     $ ./a.out ; echo $?
    963     42
    964     $ wc -c a.out
    965          45 a.out
    966 
    967 _Here_, at last, we have honestly gone as far as we can go. There is no
    968 getting around the fact that the 45th byte in the file, which specifies
    969 the number of entries in the program header table, needs to be non-zero,
    970 needs to be present, and needs to be in the 45th position from the start
    971 of the ELF header. We are forced to conclude that there is nothing more
    972 that can be done.
    973 
    974 ------------------------------------------------------------------------
    975 
    976 This forty-five-byte file is less than one-eighth the size of the
    977 smallest ELF executable we could create using the standard tools, and is
    978 less than one-fiftieth the size of the smallest file we could create
    979 using pure C code. We have stripped everything out of the file that we
    980 could, and put to dual purpose most of what we couldn't.
    981 
    982 Of course, half of the values in this file violate some part of the ELF
    983 standard, and it's a wonder that Linux will even consent to sneeze on
    984 it, much less give it a process ID. This is not the sort of program to
    985 which one would normally be willing to confess authorship.
    986 
    987 On the other hand, every single byte in this executable file can be
    988 accounted for and justified. How many executables have you created
    989 lately that you can say _that_ about?