teensy_elf_executables_for_linux.md (39927B)
1 A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux 2 ======================================================================== 3 4 (or, "Size _Is_ Everything") 5 6 Original: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html 7 8 Author: Brian Raiter <breadbox@muppetlabs.com> 9 10 > She studied it carefully for about 15 minutes. Finally, she spoke. 11 > "There's something written on here," she said, frowning, "but it's 12 > really teensy." 13 > 14 > -- [Dave Barry, "The Columnist's Caper"] 15 16 If you're a programmer who's become fed up with software bloat, then may 17 you find herein the perfect antidote. 18 19 This document explores methods for squeezing excess bytes out of simple 20 programs. (Of course, the more practical purpose of this document is to 21 describe a few of the inner workings of the ELF file format and the 22 Linux operating system. But hopefully you can also learn something about 23 how to make really teensy ELF executables in the process.) 24 25 Please note that the information and examples given here are, for the 26 most part, specific to ELF executables on a Linux platform running under 27 an Intel-386 architecture. I imagine that a good bit of the information 28 is applicable to other ELF-based Unices, but my experiences with such 29 are too limited for me to say with certainty. 30 31 Please also note that if you aren't a little bit familiar with assembly 32 code, you may find parts of this document sort of hard to follow. (The 33 assembly code that appears in this document is written using Nasm; see 34 http://www.nasm.us/. 35 36 ------------------------------------------------------------------------ 37 38 In order to start, we need a program. Almost any program will do, but 39 the simpler the program the better, since we're more interested in how 40 small we can make the executable than what the program does. 41 42 Let's take an incredibly simple program, one that does nothing but 43 return a number back to the operating system. Why not? After all, Unix 44 already comes with no less than two such programs: true and false. Since 45 0 and 1 are already taken, we'll use the number 42. 46 47 So, here is our first version: 48 49 /* tiny.c */ 50 int main(void) { return 42; } 51 52 which we can compile and test like so: 53 54 $ gcc -Wall tiny.c 55 $ ./a.out ; echo $? 56 42 57 58 So. How big is it? Well, on my machine, I get: 59 60 $ wc -c a.out 61 3998 a.out 62 63 (Yours will probably differ some.) Admittedly, that's pretty small by 64 today's standards, but it's almost certainly bigger than it needs to be. 65 66 The obvious first step is to strip the executable: 67 68 $ gcc -Wall -s tiny.c 69 $ ./a.out ; echo $? 70 42 71 $ wc -c a.out 72 2632 a.out 73 74 That's certainly an improvement. For the next step, how about 75 optimizing? 76 77 $ gcc -Wall -s -O3 tiny.c 78 $ wc -c a.out 79 2616 a.out 80 81 That also helped, but only just. Which makes sense: there's hardly 82 anything there to optimize. 83 84 It seems unlikely that there's much else we can do to shrink a 85 one-statement C program. We're going to have to leave C behind, and use 86 assembler instead. Hopefully, this will cut out all the extra overhead 87 that C programs automatically incur. 88 89 So, on to our second version. All we need to do is return 42 from 90 main(). In assembly language, this means that the function should set 91 the accumulator, eax, to 42, and then return: 92 93 ; tiny.asm 94 BITS 32 95 GLOBAL main 96 SECTION .text 97 main: 98 mov eax, 42 99 ret 100 101 We can then build and test like so: 102 103 $ nasm -f elf tiny.asm 104 $ gcc -Wall -s tiny.o 105 $ ./a.out ; echo $? 106 42 107 108 (Hey, who says assembly code is difficult?) And now how big is it? 109 110 $ wc -c a.out 111 2604 a.out 112 113 Looks like we shaved off a measly twelve bytes. So much for all the 114 extra overhead that C automatically incurs, eh? 115 116 Well, the problem is that we are still incurring a lot of overhead by 117 using the main() interface. The linker is still adding an interface to 118 the OS for us, and it is that interface that actually calls main(). So 119 how do we get around that if we don't need it? 120 121 The actual entry point that the linker uses by default is the symbol 122 with the name `_start`. When we link with gcc, it automatically includes 123 a `_start` routine, one that sets up argc and argv, among other things, 124 and then calls `main()`. 125 126 So, let's see if we can bypass this, and define our own `_start` 127 routine: 128 129 ; tiny.asm 130 BITS 32 131 GLOBAL _start 132 SECTION .text 133 _start: 134 mov eax, 42 135 ret 136 137 Will gcc do what we want? 138 139 $ nasm -f elf tiny.asm 140 $ gcc -Wall -s tiny.o 141 tiny.o(.text+0x0): multiple definition of `_start' 142 /usr/lib/crt1.o(.text+0x0): first defined here 143 /usr/lib/crt1.o(.text+0x36): undefined reference to `main' 144 145 No. Well, actually, yes it will, but first we need to learn how to ask 146 for what we want. 147 148 It so happens that gcc recognizes an option called `-nostartfiles`. 149 From the gcc info pages: 150 151 -nostartfiles 152 Do not use the standard system startup files when 153 linking. The standard libraries are used normally. 154 155 Aha! Now let's see what we can do: 156 157 $ nasm -f elf tiny.asm 158 $ gcc -Wall -s -nostartfiles tiny.o 159 $ ./a.out ; echo $? 160 Segmentation fault 161 139 162 163 Well, gcc didn't complain, but the program doesn't work. What went 164 wrong? 165 166 What went wrong is that we treated `_start` as if it were a C function, 167 and tried to return from it. In reality, it's not a function at all. 168 It's just a symbol in the object file which the linker uses to locate 169 the program's entry point. When our program is invoked, it's invoked 170 directly. If we were to look, we would see that the value on the top of 171 the stack was the number 1, which is certainly very un-address-like. In 172 fact, what is on the stack is our program's argc value. After this comes 173 the elements of the argv array, including the terminating NULL element, 174 followed by the elements of envp. And that's all. There is no return 175 address on the stack. 176 177 So, how does `_start` ever exit? Well, it calls the `exit()` function! 178 That's what it's there for, after all. 179 180 Actually, I lied. What it really does is call the `_exit()` function. 181 (Notice the leading underscore.) exit() is required to finish up some 182 tasks on behalf of the process, but those tasks will never have been 183 started, because we're bypassing the library's startup code. So we need 184 to bypass the library's shutdown code as well, and go directly to the 185 operating system's shutdown processing. 186 187 So, let's try this again. We're going to call `_exit()`, which is a 188 function that takes a single integer argument. So all we need to do is 189 push the number onto the stack and call the function. (We also need to 190 declare `_exit()` as external.) Here's our assembly: 191 192 ; tiny.asm 193 BITS 32 194 EXTERN _exit 195 GLOBAL _start 196 SECTION .text 197 _start: 198 push dword 42 199 call _exit 200 201 And we build and test as before: 202 203 $ nasm -f elf tiny.asm 204 $ gcc -Wall -s -nostartfiles tiny.o 205 $ ./a.out ; echo $? 206 42 207 208 Success at last! And now how big is it? 209 210 $ wc -c a.out 211 1340 a.out 212 213 Almost half the size! Not bad. Not bad at all. Hmmm ... so what other 214 interesting obscure options does gcc have? 215 216 Well, this one, appearing immediately after -nostartfiles in the 217 documentation, is certainly eye-catching: 218 219 -nostdlib 220 Don't use the standard system libraries and startup 221 files when linking. Only the files you specify will 222 be passed to the linker. 223 224 That's gotta be worth investigating: 225 226 $ gcc -Wall -s -nostdlib tiny.o 227 tiny.o(.text+0x6): undefined reference to `_exit' 228 229 Oops. That's right ... `_exit()` is, after all, a library function. It 230 has to be filled in from somewhere. 231 232 Okay. But surely, we don't need libc's help just to end a program, do 233 we? 234 235 No, we don't. If we're willing to leave behind all pretenses of 236 portability, we can make our program exit without having to link with 237 anything else. First, though, we need to know how to make a system call 238 under Linux. 239 240 ------------------------------------------------------------------------ 241 242 Linux, like most operating systems, provides basic necessities to the 243 programs it hosts via system calls. This includes things like opening a 244 file, reading and writing to file handles -- and, of course, shutting 245 down a process. 246 247 The Linux system call interface is a single instruction: `int 0x80`. 248 All system calls are done via this interrupt. To make a system call, eax 249 should contain a number that indicates which system call is being 250 invoked, and other registers are used to hold the arguments, if any. If 251 the system call takes one argument, it will be in ebx; a system call 252 with two arguments will use ebx and ecx. Likewise, edx, esi, and edi are 253 used if a third, fourth, or fifth argument is required, respectively. 254 Upon return from a system call, eax will contain the return value. If an 255 error occurs, eax will contain a negative value, with the absolute value 256 indicating the error. 257 258 The numbers for the different system calls are listed in 259 `/usr/include/asm/unistd.h`. A quick peek will tell us that the exit 260 system call is assigned the number 1. Like the C function, it takes one 261 argument, the value to return to the parent process, and so this will go 262 into ebx. 263 264 We now know all we need to know to create the next version of our 265 program, one that won't need assistance from any external functions to 266 work: 267 268 ; tiny.asm 269 BITS 32 270 GLOBAL _start 271 SECTION .text 272 _start: 273 mov eax, 1 274 mov ebx, 42 275 int 0x80 276 277 Here we go: 278 279 $ nasm -f elf tiny.asm 280 $ gcc -Wall -s -nostdlib tiny.o 281 $ ./a.out ; echo $? 282 42 283 284 Ta-da! And the size? 285 286 $ wc -c a.out 287 372 a.out 288 289 Now _that's_ tiny! Almost a fourth the size of the previous version! 290 291 So ... can we do anything else to make it even smaller? 292 293 How about using shorter instructions? 294 295 If we generate a list file for the assembly code, we'll find the 296 following: 297 298 00000000 B801000000 mov eax, 1 299 00000005 BB2A000000 mov ebx, 42 300 0000000A CD80 int 0x80 301 302 Well, gee, we don't need to initialize all of ebx, since the operating 303 system is only going to use the lowest byte. Setting bl alone will be 304 sufficient, and will take two bytes instead of five. 305 306 We can also set eax to one by xor'ing it to zero and then using a 307 one-byte increment instruction; this will save two more bytes. 308 309 00000000 31C0 xor eax, eax 310 00000002 40 inc eax 311 00000003 B32A mov bl, 42 312 00000005 CD80 int 0x80 313 314 I think it's pretty safe to say that we're not going to make this 315 program any smaller than that. 316 317 As an aside, we might as well stop using gcc to link our executable, 318 seeing as we're not using any of its added functionality, and just call 319 the linker, ld, ourselves: 320 321 $ nasm -f elf tiny.asm 322 $ ld -s tiny.o 323 $ ./a.out ; echo $? 324 42 325 $ wc -c a.out 326 368 a.out 327 328 Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did, 329 but alignment considerations within the ELF file caused it to require an 330 extra byte of padding.) 331 332 So ... have we reached the end? Is this as small as we can go? 333 334 Well, hm. Our program is now seven bytes long. Do ELF files really 335 require 361 bytes of overhead? What's in this file, anyway? 336 337 We can peek into the contents of the file using objdump: 338 339 $ objdump -x a.out | less 340 341 The output may look like gibberish, but right now let's just focus on 342 the list of sections: 343 344 Sections: 345 Idx Name Size VMA LMA File off Algn 346 0 .text 00000007 08048080 08048080 00000080 2**4 347 CONTENTS, ALLOC, LOAD, READONLY, CODE 348 1 .comment 0000001c 00000000 00000000 00000087 2**0 349 CONTENTS, READONLY 350 351 The complete .text section is listed as being seven bytes long, just as 352 we specified. So it seems safe to conclude that we now have complete 353 control of the machine-language content of our program. 354 355 But then there's this other section named ".comment". Who ordered 356 _that_? And it's 28 bytes long, even! We may not be sure what this 357 .comment section is, but it seems a good bet that it isn't a necessary 358 feature.... 359 360 The .comment section is listed as being located at file offset 00000087 361 (hexadecimal). If we use a hexdump program to look at that area of the 362 file, we will see: 363 364 00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477 1.@.*...The Netw 365 00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E ide Assembler 0. 366 000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472 98...symtab..str 367 368 Well, well, well. Who'd've thought that Nasm would undermine our quest 369 like this? Maybe we should switch to using gas, AT&T syntax 370 notwithstanding.... 371 372 Alas, if we do: 373 374 ; tiny.s 375 .globl _start 376 .text 377 _start: 378 xorl %eax, %eax 379 incl %eax 380 movb $42, %bl 381 int $0x80 382 383 ... we will find: 384 385 $ gcc -s -nostdlib tiny.s 386 $ ./a.out ; echo $? 387 42 388 $ wc -c a.out 389 368 a.out 390 391 ... no difference! 392 393 Well, actually there is some difference. Turning once again to objdump, 394 we see: 395 396 Sections: 397 Idx Name Size VMA LMA File off Algn 398 0 .text 00000007 08048074 08048074 00000074 2**2 399 CONTENTS, ALLOC, LOAD, READONLY, CODE 400 1 .data 00000000 0804907c 0804907c 0000007c 2**2 401 CONTENTS, ALLOC, LOAD, DATA 402 2 .bss 00000000 0804907c 0804907c 0000007c 2**2 403 ALLOC 404 405 No comment section, but now we have two useless sections for storing our 406 nonexistent data. And even though these sections are zero bytes long, 407 they incur overhead, bringing our file size up for no good reason. 408 409 Okay, so just what is all this overhead, and how do we get rid of it? 410 411 Well, to answer these questions, we must begin diving into some real 412 wizardry. We need to understand the ELF format. 413 414 ------------------------------------------------------------------------ 415 416 The canonical document describing the ELF format for Intel-386 417 architectures can be found at http://refspecs.linuxbase.org/elf/elf.pdf. 418 (You can also find a flat-text version of version 1.0 of the standard at 419 http://www.muppetlabs.com/~breadbox/software/ELF.txt.) This 420 specification covers a lot of territory, so if you'd prefer to not read 421 the whole thing yourself, I'll understand. Basically, here's what we 422 need to know: 423 424 Every ELF file begins with a structure called the ELF header. This 425 structure is 52 bytes long, and contains several pieces of information 426 that describe the contents of the file. For example, the first sixteen 427 bytes contain an "identifier", which includes the file's magic-number 428 signature (7F 45 4C 46), and some one-byte flags indicating that the 429 contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other 430 fields in the ELF header contain information such as: the target 431 architecture; whether the ELF file is an executable, an object file, or 432 a shared-object library; the program's starting address; and the 433 locations within the file of the program header table and the section 434 header table. 435 436 These two tables can appear anywhere in the file, but typically the 437 former appears immediately following the ELF header, and the latter 438 appears at or near the end of the file. The two tables serve similar 439 purposes, in that they identify the component parts of the file. 440 However, the section header table focuses more on identifying where the 441 various parts of the program are within the file, while the program 442 header table describes where and how these parts are to be loaded into 443 memory. In brief, the section header table is for use by the compiler 444 and linker, while the program header table is for use by the program 445 loader. The program header table is optional for object files, and in 446 practice is never present. Likewise, the section header table is 447 optional for executables -- but is almost _always_ present! 448 449 So, this is the answer to our first question. A fair piece of the 450 overhead in our program is a completely unnecessary section header 451 table, and maybe some equally useless sections that don't contribute to 452 our program's memory image. 453 454 So, we turn to our second question: how do we go about getting rid of 455 all that? 456 457 Alas, we're on our own here. None of the standard tools will deign to 458 make an executable without a section header table of some kind. If we 459 want such a thing, we'll have to do it ourselves. 460 461 This doesn't quite mean that we have to pull out a binary editor and 462 code the hexadecimal values by hand, though. Good old Nasm has a flat 463 binary output format, which will serve us well. All we need now is the 464 image of an empty ELF executable, which we can fill in with our program. 465 Our program, and nothing else. 466 467 We can look at the ELF specification, and /usr/include/linux/elf.h, and 468 executables created by the standard tools, to figure out what our empty 469 ELF executable should look like. But, if you're the impatient type, you 470 can just use the one I've supplied here: 471 472 BITS 32 473 474 org 0x08048000 475 476 ehdr: ; Elf32_Ehdr 477 db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident 478 times 8 db 0 479 dw 2 ; e_type 480 dw 3 ; e_machine 481 dd 1 ; e_version 482 dd _start ; e_entry 483 dd phdr - $$ ; e_phoff 484 dd 0 ; e_shoff 485 dd 0 ; e_flags 486 dw ehdrsize ; e_ehsize 487 dw phdrsize ; e_phentsize 488 dw 1 ; e_phnum 489 dw 0 ; e_shentsize 490 dw 0 ; e_shnum 491 dw 0 ; e_shstrndx 492 493 ehdrsize equ $ - ehdr 494 495 phdr: ; Elf32_Phdr 496 dd 1 ; p_type 497 dd 0 ; p_offset 498 dd $$ ; p_vaddr 499 dd $$ ; p_paddr 500 dd filesize ; p_filesz 501 dd filesize ; p_memsz 502 dd 5 ; p_flags 503 dd 0x1000 ; p_align 504 505 phdrsize equ $ - phdr 506 507 _start: 508 509 ; your program here 510 511 filesize equ $ - $$ 512 513 This image contains an ELF header, identifying the file as an Intel 386 514 executable, with no section header table and a program header table 515 containing one entry. Said entry instructs the program loader to load 516 the entire file into memory (it's normal behavior for a program to 517 include its ELF header and program header table in its memory image) 518 starting at memory address 0x08048000 (which is the default address for 519 executables to load), and to begin executing the code at `_start`, which 520 appears immediately after the program header table. No .data segment, no 521 .bss segment, no commentary -- nothing but the bare necessities. 522 523 So, let's add in our little program: 524 525 ; tiny.asm 526 org 0x08048000 527 528 ; 529 ; (as above) 530 ; 531 532 _start: 533 mov bl, 42 534 xor eax, eax 535 inc eax 536 int 0x80 537 538 filesize equ $ - $$ 539 540 and try it out: 541 542 $ nasm -f bin -o a.out tiny.asm 543 $ chmod +x a.out 544 $ ./a.out ; echo $? 545 42 546 547 We have just created an executable completely from scratch. How about 548 that? And now, take a look at its size: 549 550 $ wc -c a.out 551 91 a.out 552 553 _Ninety-one bytes_. Less than one-fourth the size of our previous 554 attempt, and less than one-_fortieth_ the size of our first! 555 556 What's more, this time we can account for every last byte. We know 557 exactly what's in the executable, and why it needs to be there. This is, 558 finally, the limit. We can't get any smaller than this. 559 560 Or can we? 561 562 ------------------------------------------------------------------------ 563 564 Well, if you actually stopped to read the ELF specification, you might 565 have noticed a couple of facts. 1) The different parts of an ELF file 566 are permitted to be located anywhere (except the ELF header, which must 567 be at the top of the file), and they can even overlap each other. 2) 568 Some of the fields in the headers aren't actually used. 569 570 In particular, I'm thinking of that string of zeros at the end of the 571 16-byte identification field. They are pure padding, to make room for 572 future expansion of the ELF standard. So the OS shouldn't care at all 573 what's in there. And we're already loading everything into memory 574 anyway, and our program is only seven bytes long.... 575 576 Can we put our code inside the ELF header itself? 577 578 Why not? 579 580 ; tiny.asm 581 582 BITS 32 583 584 org 0x08048000 585 586 ehdr: ; Elf32_Ehdr 587 db 0x7F, "ELF" ; e_ident 588 db 1, 1, 1, 0, 0 589 _start: mov bl, 42 590 xor eax, eax 591 inc eax 592 int 0x80 593 dw 2 ; e_type 594 dw 3 ; e_machine 595 dd 1 ; e_version 596 dd _start ; e_entry 597 dd phdr - $$ ; e_phoff 598 dd 0 ; e_shoff 599 dd 0 ; e_flags 600 dw ehdrsize ; e_ehsize 601 dw phdrsize ; e_phentsize 602 dw 1 ; e_phnum 603 dw 0 ; e_shentsize 604 dw 0 ; e_shnum 605 dw 0 ; e_shstrndx 606 607 ehdrsize equ $ - ehdr 608 609 phdr: ; Elf32_Phdr 610 dd 1 ; p_type 611 dd 0 ; p_offset 612 dd $$ ; p_vaddr 613 dd $$ ; p_paddr 614 dd filesize ; p_filesz 615 dd filesize ; p_memsz 616 dd 5 ; p_flags 617 dd 0x1000 ; p_align 618 619 phdrsize equ $ - phdr 620 621 filesize equ $ - $$ 622 623 After all, bytes are bytes! 624 625 $ nasm -f bin -o a.out tiny.asm 626 $ chmod +x a.out 627 $ ./a.out ; echo $? 628 42 629 $ wc -c a.out 630 84 a.out 631 632 Not bad, eh? 633 634 Now we've really gone as low as we can go. Our file is exactly as long 635 as one ELF header and one program header table entry, both of which we 636 absolutely require in order to get loaded into memory and run. So 637 there's nothing left to reduce now! 638 639 Except ... 640 641 Well, what if we could do the same thing to the program header table 642 that we just did to the program? Have it overlap with the ELF header, 643 that is. Is it possible? 644 645 It is indeed. Take a look at our program. Note that the last eight bytes 646 in the ELF header bear a certain kind of resemblence to the first eight 647 bytes in the program header table. A certain kind of resemblence that 648 might be described as "identical". 649 650 So ... 651 652 ; tiny.asm 653 654 BITS 32 655 656 org 0x08048000 657 658 ehdr: 659 db 0x7F, "ELF" ; e_ident 660 db 1, 1, 1, 0, 0 661 _start: mov bl, 42 662 xor eax, eax 663 inc eax 664 int 0x80 665 dw 2 ; e_type 666 dw 3 ; e_machine 667 dd 1 ; e_version 668 dd _start ; e_entry 669 dd phdr - $$ ; e_phoff 670 dd 0 ; e_shoff 671 dd 0 ; e_flags 672 dw ehdrsize ; e_ehsize 673 dw phdrsize ; e_phentsize 674 phdr: dd 1 ; e_phnum ; p_type 675 ; e_shentsize 676 dd 0 ; e_shnum ; p_offset 677 ; e_shstrndx 678 ehdrsize equ $ - ehdr 679 dd $$ ; p_vaddr 680 dd $$ ; p_paddr 681 dd filesize ; p_filesz 682 dd filesize ; p_memsz 683 dd 5 ; p_flags 684 dd 0x1000 ; p_align 685 phdrsize equ $ - phdr 686 687 filesize equ $ - $$ 688 689 And sure enough, Linux doesn't mind our parsimony one bit: 690 691 $ nasm -f bin -o a.out tiny.asm 692 $ chmod +x a.out 693 $ ./a.out ; echo $? 694 42 695 $ wc -c a.out 696 76 a.out 697 698 Now we've _really_ gone as low as we can go. There's no way to overlap 699 the two structures any more than this. The bytes simply don't match up. 700 This is the end of the line! 701 702 Unless, that is, we could change the contents of the structures to make 703 them match even further.... 704 705 How many of these fields is Linux actually looking at, anyway? For 706 example, does Linux actually check to see if the `e_machine` field 707 contains 3 (indicating an Intel 386 target), or is it just assuming that 708 it does? 709 710 As a matter of fact, in that case it does. But a surprising number of 711 other fields are being quietly ignored. 712 713 So: Here's what is and isn't essential in the ELF header. The first four 714 bytes have to contain the magic number, or else Linux won't touch it. 715 The other three bytes in the `e_ident` field are not checked, however, 716 which means we have no less than twelve contiguous bytes we can set to 717 anything at all. `e_type` has to be set to 2, to indicate an executable, 718 and `e_machine` has to be 3, as just noted. `e_version` is, like the 719 version number inside `e_ident`, completely ignored. (Which is sort of 720 understandable, seeing as currently there's only one version of the ELF 721 standard.) `e_entry` naturally has to be valid, since it points to the 722 start of the program. And clearly, `e_phoff` needs to contain the 723 correct offset of the program header table in the file, and `e_phnum` 724 needs to contain the right number of entries in said table. `e_flags`, 725 however, is documented as being currently unused for Intel, so it should 726 be free for us to reuse. `e_ehsize` is supposed to be used to verify 727 that the ELF header has the expected size, but Linux pays it no mind. 728 `e_phentsize` is likewise for validating the size of the program header 729 table entries. This one was unchecked in older kernels, but now it 730 needs to be set correctly. Everything else in the ELF header is about 731 the section header table, which doesn't come into play with executable 732 files. 733 734 And now how about the program header table entry? Well, `p_type` has to 735 contain 1, to mark it as a loadable segment. `p_offset` really needs to 736 have the correct file offset to start loading. Likewise, `p_vaddr` needs 737 to contain the proper load address. Note, however, that we're not 738 required to load at 0x08048000. Almost any address can be used as long 739 as it's above 0x00000000, below 0x80000000, and page-aligned. The 740 `p_paddr` field is documented as being ignored, so that's guaranteed to 741 be free. `p_filesz` indicates how many bytes to load out of the file 742 into memory, and `p_memsz` indicates how large the memory segment needs 743 to be, so these numbers ought to be relatively sane. `p_flags` indicates 744 what permissions to give the memory segment. It needs to be readable 745 (4), or it won't be usable at all, and it needs to also be executable 746 (1), or else we can't execute code in it. Other bits can probably be set 747 as well, but we need to have those at minimum. Finally, `p_align` gives 748 the alignment requirements for the memory segment. This field is mainly 749 used when relocating segments containing position-independent code (as 750 for shared libraries), so for an executable file Linux will ignore 751 whatever garbage we store here. 752 753 All in all, that's a fair bit of leeway. In particular, a bit of 754 scrutiny will reveal that most of the necessary fields in the ELF header 755 are in the first half - the second half is almost completely free for 756 munging. With this in mind, we can interpose the two structures quite a 757 bit more than we did previously: 758 759 ; tiny.asm 760 761 BITS 32 762 763 org 0x00200000 764 765 db 0x7F, "ELF" ; e_ident 766 db 1, 1, 1, 0, 0 767 _start: 768 mov bl, 42 769 xor eax, eax 770 inc eax 771 int 0x80 772 dw 2 ; e_type 773 dw 3 ; e_machine 774 dd 1 ; e_version 775 dd _start ; e_entry 776 dd phdr - $$ ; e_phoff 777 phdr: dd 1 ; e_shoff ; p_type 778 dd 0 ; e_flags ; p_offset 779 dd $$ ; e_ehsize ; p_vaddr 780 ; e_phentsize 781 dw 1 ; e_phnum ; p_paddr 782 dw 0 ; e_shentsize 783 dd filesize ; e_shnum ; p_filesz 784 ; e_shstrndx 785 dd filesize ; p_memsz 786 dd 5 ; p_flags 787 dd 0x1000 ; p_align 788 789 filesize equ $ - $$ 790 791 As you can (hopefully) see, the first twenty bytes of the program header 792 table now overlap the last twenty bytes of the ELF header. The two 793 dovetail quite nicely, actually. There are only two parts of the ELF 794 header within the overlapped region that matter. The first is the 795 `e_phnum` field, which just happens to coincide with the `p_paddr` 796 field, one of the few fields in the program header table which is 797 definitely ignored. The other is the `e_phentsize` field, which 798 coincides with the top half of the `p_vaddr` field. These are made to 799 match up by selecting a non-standard load address for our program, with 800 a top half equal to 0x0020. 801 802 Now we have _really_ left behind all pretenses of portability ... 803 804 $ nasm -f bin -o a.out tiny.asm 805 $ chmod +x a.out 806 $ ./a.out ; echo $? 807 42 808 $ wc -c a.out 809 64 a.out 810 811 ... but it works! And the program is twelve bytes shorter, exactly as 812 predicted. 813 814 This is where I say that we can't do any better than this, but of 815 course, we already know that we can -- if we could get the program 816 header table to reside _completely_ within the ELF header. Can this holy 817 grail be achieved? 818 819 Well, we can't just move it up another twelve bytes without hitting 820 hopeless obstacles trying to reconcile several fields in both 821 structures. The only other possibility would be to have it start 822 immediately following the first four bytes. This puts the first part of 823 the program header table comfortably within the `e_ident` area, but 824 still leaves problems with the rest of it. After some experimenting, it 825 looks like it isn't going to quite be possible. 826 827 However, it turns out that there are still a couple more fields in the 828 program header table that we can pervert. 829 830 We noted that `p_memsz` indicates how much memory to allocate for the 831 memory segment. Obviously it needs to be at least as big as `p_filesz`, 832 but there wouldn't be any harm if it was larger. Just because we ask for 833 memory doesn't mean we have to use it, after all. 834 835 Secondly, it turns out that, contrary to all my expectations, the 836 executable bit can be dropped from the `p_flags` field. It turns out 837 that the readable and executable bits are redundant: either one will 838 imply the other. 839 840 So, with these facts in mind, we can reorganize the file into this 841 little monstrosity: 842 843 ; tiny.asm 844 845 BITS 32 846 847 org 0x00010000 848 849 db 0x7F, "ELF" ; e_ident 850 dd 1 ; p_type 851 dd 0 ; p_offset 852 dd $$ ; p_vaddr 853 dw 2 ; e_type ; p_paddr 854 dw 3 ; e_machine 855 dd _start ; e_version ; p_filesz 856 dd _start ; e_entry ; p_memsz 857 dd 4 ; e_phoff ; p_flags 858 _start: 859 mov bl, 42 ; e_shoff ; p_align 860 xor eax, eax 861 inc eax ; e_flags 862 int 0x80 863 db 0 864 dw 0x34 ; e_ehsize 865 dw 0x20 ; e_phentsize 866 dw 1 ; e_phnum 867 dw 0 ; e_shentsize 868 dw 0 ; e_shnum 869 dw 0 ; e_shstrndx 870 871 filesize equ $ - $$ 872 873 The `p_flags` field has been changed from 5 to 4, as we noted we could 874 get away with doing. This 4 is also the value of the `e_phoff` field, 875 which gives the offset into the file for the program header table, which 876 is exactly where we've located it. The program (remember that?) has been 877 moved down to lower part of the ELF header, beginning at the `e_shoff` 878 field and ending inside the `e_flags` field. 879 880 Note that the load address has been changed to a much lower number -- 881 about as low as it can be, in fact. This keeps the value in the 882 `e_entry` field to a reasonably small number, which is good since it's 883 also the `p_memsz` number. (Actually, with virtual memory it hardly 884 matters -- we could have left it at our original value and it would work 885 just as well. But there's no harm in being polite.) 886 887 The change to `p_filesz` may require an explanation. Because we aren't 888 setting the write bit in the `p_flags` field, Linux won't let us define 889 a `p_memsz` value greater than `p_filesz`, since it can't 890 zero-initialize those extra bytes if they aren't writeable. Since we 891 can't change the `p_flags` field without moving the program header table 892 out of alignment, you might think that the only solution would be to 893 lower the `p_memsz` value back down to equal `p_filesz` (which would 894 make it impossible to share it with `e_entry`). However, another 895 solution exists, namely to increase `p_filesz` to equal `p_memsz`. That 896 means they're both larger than the real file size -- quite a bit larger, 897 in fact -- but it absolves the loader from having to write to read-only 898 memory, which is all it cared about. 899 900 And so ... 901 902 $ nasm -f bin -o a.out tiny.asm 903 $ chmod +x a.out 904 $ ./a.out ; echo $? 905 42 906 $ wc -c a.out 907 52 a.out 908 909 ... and so, with both the program header table and the program itself 910 completely embedded within the ELF header, our executable file is now 911 exactly as big as the ELF header! No more, no less. And _still_ running 912 without a single complaint from Linux! 913 914 Now, finally, we have truly and certainly reached the absolute minimum 915 possible. There can be no question about it, right? After all, we have 916 to have a complete ELF header (even if it is badly mangled), or else 917 Linux wouldn't give us the time of day! 918 919 Right? 920 921 Wrong. We have one last dirty trick left. 922 923 It seems to be the case that if the file isn't quite the size of a full 924 ELF header, Linux will still play ball, and fill out the missing bytes 925 with zeros. We have no less than seven zeros at the end of our file, and 926 if we drop them from the file image: 927 928 ; tiny.asm 929 930 BITS 32 931 932 org 0x00010000 933 934 db 0x7F, "ELF" ; e_ident 935 dd 1 ; p_type 936 dd 0 ; p_offset 937 dd $$ ; p_vaddr 938 dw 2 ; e_type ; p_paddr 939 dw 3 ; e_machine 940 dd _start ; e_version ; p_filesz 941 dd _start ; e_entry ; p_memsz 942 dd 4 ; e_phoff ; p_flags 943 _start: 944 mov bl, 42 ; e_shoff ; p_align 945 xor eax, eax 946 inc eax ; e_flags 947 int 0x80 948 db 0 949 dw 0x34 ; e_ehsize 950 dw 0x20 ; e_phentsize 951 db 1 ; e_phnum 952 ; e_shentsize 953 ; e_shnum 954 ; e_shstrndx 955 956 filesize equ $ - $$ 957 958 ... we can, incredibly enough, still produce a working executable: 959 960 $ nasm -f bin -o a.out tiny.asm 961 $ chmod +x a.out 962 $ ./a.out ; echo $? 963 42 964 $ wc -c a.out 965 45 a.out 966 967 _Here_, at last, we have honestly gone as far as we can go. There is no 968 getting around the fact that the 45th byte in the file, which specifies 969 the number of entries in the program header table, needs to be non-zero, 970 needs to be present, and needs to be in the 45th position from the start 971 of the ELF header. We are forced to conclude that there is nothing more 972 that can be done. 973 974 ------------------------------------------------------------------------ 975 976 This forty-five-byte file is less than one-eighth the size of the 977 smallest ELF executable we could create using the standard tools, and is 978 less than one-fiftieth the size of the smallest file we could create 979 using pure C code. We have stripped everything out of the file that we 980 could, and put to dual purpose most of what we couldn't. 981 982 Of course, half of the values in this file violate some part of the ELF 983 standard, and it's a wonder that Linux will even consent to sneeze on 984 it, much less give it a process ID. This is not the sort of program to 985 which one would normally be willing to confess authorship. 986 987 On the other hand, every single byte in this executable file can be 988 accounted for and justified. How many executables have you created 989 lately that you can say _that_ about?