--- slug: tiny-binaries-assembly-optimization title: "Tiny Binaries: Assembly Optimization" date: "2022-01-01T08:22:07-04:00" --- Here's how I reduced the size of [the Assembly implementation][asm-naive] in [Tiny Binaries][tb] from 456 bytes to 360 bytes with optimizations, and then reduced the size from 360 bytes to 114 bytes with dirty tricks. ### Shrinking the Code Here's the code for the [unoptimized Assembly implementation (`asm-naive`)][asm-naive]: ```nasm ; ; hi.s: unoptimized linux x86-64 assembly implementation. ; bits 64 global _start section .rodata ; "hi!\n" hi db "hi!", 10 len equ $ - hi section .text _start: mov rax, 1 ; write mov rdi, 1 ; fd mov rsi, hi ; msg mov rdx, len ; len syscall ; call write() mov rax, 60 ; exit mov rdi, 0 ; exit code syscall ; call exit() ```   It assembles to a 456 byte binary with 39 bytes of code and 4 bytes of data: ```bash $ make nasm -f elf64 -o hi.o hi.s ld -s -static -nostdinc -o hi hi.o $ wc -c ./hi 456 ./hi $ objdump -h ./hi ... Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000027 0000000000400080 0000000000400080 00000080 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA $ objdump -d -Mintel ./hi ... 0000000000400080 <.text>: 400080: b8 01 00 00 00 mov eax,0x1 400085: bf 01 00 00 00 mov edi,0x1 40008a: 48 be a8 00 40 00 00 movabs rsi,0x4000a8 400091: 00 00 00 400094: ba 04 00 00 00 mov edx,0x4 400099: 0f 05 syscall 40009b: b8 3c 00 00 00 mov eax,0x3c 4000a0: bf 00 00 00 00 mov edi,0x0 4000a5: 0f 05 syscall ```   First, we replace all the bloated 5 byte 32-bit and 64-bit instructions with slimmer 16-bit and 32-bit equivalents: ```diff diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s index 9d17cab..3694091 100644 --- a/src/asm-naive/hi.s +++ b/src/asm-naive/hi.s @@ -14,12 +14,12 @@ section .rodata section .text _start: - mov rax, 1 ; write - mov rdi, 1 ; fd + inc al ; write + inc edi; fd mov rsi, hi ; msg - mov rdx, len ; len + mov dl, len ; len syscall ; call write() - mov rax, 60 ; exit - mov rdi, 0 ; exit code + mov al, 60 ; exit + xor edi, edi ; exit code syscall ; call exit() ```   **Notes:** * `inc al` works because Linux zeros registers on process init. * `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The other candidates (`inc dil`, `inc di`, `mov dil, al`, and `mov di, ax`) are all 3 bytes. * `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`, `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and `xor rdi, rdi`) are all 3-5 bytes. These changes shrink our binary to 440 bytes, with 24 bytes of code and 4 bytes of data: ```bash $ make nasm -f elf64 -o hi.o hi.s ld -s -static -nostdinc -o hi hi.o $ wc -c ./hi 440 ./hi $ objdump -hd -Mintel ./hi ... Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000018 0000000000400080 0000000000400080 00000080 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000004 0000000000400098 0000000000400098 00000098 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA Disassembly of section .text: 0000000000400080 <.text>: 400080: fe c0 inc al 400082: ff c7 inc edi 400084: 48 be 98 00 40 00 00 movabs rsi,0x400098 40008b: 00 00 00 40008e: b2 04 mov dl,0x4 400090: 0f 05 syscall 400092: b0 3c mov al,0x3c 400094: 31 ff xor edi,edi 400096: 0f 05 syscall ```   The code is now 24 bytes, but at ~41% of our code that 10 byte `mov` sticks out like a sore thumb. We can drop another 2 bytes of code and 4 bytes of data by doing the following: 1. Remove `mov rsi, str` (-10 bytes, good riddance). 2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section overhead). 3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5 bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus one byte for `push`. 4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer. The stack isn't 16-byte aligned, but `syscall` doesn't seem to mind. Here's the result: ```nasm bits 64 ; "hi!\n", encoded as 32-bit little-endian int str: equ 0x0a216968 section .text global _start _start: push dword str ; push str (68 68 69 21 0a) inc al ; write() (fe c0) inc edi ; fd (ff c7) mov rsi, rsp ; msg (48 89 e6) mov dl, 4 ; len (b2 04) syscall ; call write() (0f 05) mov al, 60 ; exit() (b0 3c) xor edi, edi ; exit code (31 ff) syscall ; call exit() (0f 05) ```   This results in a 360 byte binary with 22 bytes of code and no data section: ```bash $ make nasm -f elf64 -o hi.o hi.s ld -s -static -nostdinc -o hi hi.o $ ./hi hi! $ wc -c ./hi 360 ./hi $ objdump -h ./hi ... Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000016 0000000000400080 0000000000400080 00000080 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE ```   This is the smallest *legitimate* assembly implementation that I could cook up. It's available in the [companion GitHub repository][repo] and included in the [results][tb] as [`asm-opt`][asm-opt]. ### Dirty Tricks In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness created a 114 byte static [x86-64][] [Linux][] binary by overlapping portions of the [ELF header][] with the [program header][ph], then embedding the code in unverified\* gaps of the [ELF header][]. (\* Unverified by [Linux][], that is. Junk in these fields causes `readelf` and `objdump` give these binaries the stink eye). He's also got a handy table showing exactly which [ELF header][] bytes can be safely abused. In particular, there are two 12 byte regions at offsets `4` and `40` which could store our 22 bytes of code. I reordered the code and divided it into into two chunks: * `code_0`: First 10 bytes, plus a two byte jump to `code_1` * `code_1`: Remaining 12 bytes. Here's the [assembly][] for the two chunks: ```nasm ; entry point ; (10 bytes of code plus a 2 byte jump to code_1) code_0: push dword str ; push string onto stack (68 68 69 21 0a) inc al ; write() (fe c0) mov rsi, rsp ; str (48 89 e6) jmp code_1 ; jump to next chunk (eb 18) ; ... (mandatory ELF stuff omitted for brevity) ; second code chunk ; (12 bytes) code_1: inc edi ; fd (89 c7) mov dl, 4 ; len (b2 04) syscall ; call write() (0f 05) mov al, 60 ; exit() (b0 3c) xor edi, edi ; 0 exit code (31 ff) syscall ; call exit() (0f 05) ```   I tried shrinking Nathan's binary in a couple other places (for example, the 8 bytes of padding at the end of the file), but if you remove any more padding then [Linux][] refuses to execute the binary. With these changes the final binary size is 114 bytes. [Linux][] will still happily execute the binary, but `readelf`, `objdump`, and `file` can't make any sense of it: ```bash $ make nasm -f bin -o hi hi.s chmod a+x hi $ ./hi hi! $ wc -c ./hi 114 ./hi $ objdump -hdMintel ./hi objdump: ./hi: file format not recognized $ readelf -SW ./hi There are 65329 section headers, starting at offset 0x3a: readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header readelf: Error: Reading 1014951344 bytes extends past end of file for section headers readelf: Error: Too many program headers - 0x50f - the file is not that big $ file ./hi ./hi: ELF, unknown class 104 $ hd ./hi 00000000 7f 45 4c 46 68 68 69 21 0a fe c0 48 89 e6 eb 18 |.ELFhhi!...H....| 00000010 02 00 3e 00 01 00 00 00 04 80 02 00 00 00 00 00 |..>.............| 00000020 3a 00 00 00 00 00 00 00 ff c7 b2 04 0f 05 b0 3c |:..............<| 00000030 31 ff 0f 05 40 00 38 00 01 00 01 00 00 00 05 00 |1...@.8.........| 00000040 00 00 00 00 00 00 00 00 00 00 00 80 02 00 00 00 |................| 00000050 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 |..........r.....| 00000060 00 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00 |..r.............| 00000070 00 00 |..| 00000072 ```   It'll even run as a [Docker][] image: ```bash $ cat Dockerfile FROM scratch COPY ./hi /hi ENTRYPOINT ["/hi"] $ docker build -t zoinks . Sending build context to Docker daemon 8.704kB Step 1/3 : FROM scratch ---> Step 2/3 : COPY ./hi /hi ---> 336acd7e2d94 Step 3/3 : ENTRYPOINT ["/hi"] ---> Running in e20eca61de44 Removing intermediate container e20eca61de44 ---> 297e5d7db5f8 Successfully built 297e5d7db5f8 Successfully tagged zoinks:latest $ docker images zoinks --format '{{.Size}}' 114B $ docker run --rm -it zoinks hi! $ ```   This glorious monstrosity is included in the [companion repository][repo] in [the `src/asm-elf-1` directory][asm-elf], and shown in the [Tiny Binary results][tb] as `asm-elf`. ### Links If you enjoyed this post, you may also like: * [Tiny Binaries Repository][repo]: Companion repository with source code, build instructions, and additional details for this post. * [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][] [Linux][] binaries, including the table of unverified [ELF header][] bytes that I used. * [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux][tiny-elf-orig]: Classic original article on tiny static 32-bit binaries. * [My Own Private Binary][]: Sequel to [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux][tiny-elf-orig] where the author creates a 0 byte executable using a kernel module. [tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}} "Tiny Binaries" [assembly]: https://en.wikipedia.org/wiki/Assembly_language "Assembly language" [asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive "Unoptimized assembly implementation." [asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt "Optimized assembly implementation." [asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1 "Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped." [repo]: https://github.com/pablotron/tiny-binaries "Tiny Binaries GitHub repository" [tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html "Tiny ELF Files: Revisited in 2021" [x86-64]: https://en.wikipedia.org/wiki/X86-64 "64-bit version of x86 instruction set" [linux]: https://en.wikipedia.org/wiki/Linux "Linux operating system" [elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format "Executable and Linkable Format" [ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header "ELF Program Header" [elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header "ELF file header" [tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" [my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html "My Own Private Binary" [docker]: https://en.wikipedia.org/wiki/Docker_(software) "Docker"