diff options
author | Paul Duncan <pabs@pablotron.org> | 2022-01-01 14:03:09 -0500 |
---|---|---|
committer | Paul Duncan <pabs@pablotron.org> | 2022-01-01 14:03:09 -0500 |
commit | 1c25fe88577dbe9bbb9593f1141d3fae1f20c31f (patch) | |
tree | 54ea09e8eba21178768b4c853e12e5aadd7f6106 /content/posts/2022-01-01-tiny-binaries-assembly-optimization.md | |
parent | 956cbe7f6c2c7936fe2efe5e1e6c4ba25d5bf928 (diff) | |
download | pablotron.org-1c25fe88577dbe9bbb9593f1141d3fae1f20c31f.tar.bz2 pablotron.org-1c25fe88577dbe9bbb9593f1141d3fae1f20c31f.zip |
add posts/2022-01-01-tiny-binaries-assembly-optimization.md
Diffstat (limited to 'content/posts/2022-01-01-tiny-binaries-assembly-optimization.md')
-rw-r--r-- | content/posts/2022-01-01-tiny-binaries-assembly-optimization.md | 373 |
1 files changed, 373 insertions, 0 deletions
diff --git a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md new file mode 100644 index 0000000..6717a7b --- /dev/null +++ b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md @@ -0,0 +1,373 @@ +--- +slug: tiny-binaries-assembly-optimization +title: "Tiny Binaries: Assembly Optimization" +date: "2022-01-01T08:22:07-04:00" +--- + +Here's how I reduced the size of [the Assembly +implementation][asm-naive] in [Tiny Binaries][tb] from 456 bytes to 360 +bytes with optimizations, and then reduced the size from 360 bytes to +114 bytes with dirty tricks. + +### Shrinking the Code + +Here's the code for the [unoptimized Assembly implementation +(`asm-naive`)][asm-naive]: + +```nasm +; +; hi.s: unoptimized linux x86-64 assembly implementation. +; + +bits 64 + +global _start + +section .rodata + ; "hi!\n" + hi db "hi!", 10 + len equ $ - hi + +section .text + +_start: + mov rax, 1 ; write + mov rdi, 1 ; fd + mov rsi, hi ; msg + mov rdx, len ; len + syscall ; call write() + + mov rax, 60 ; exit + mov rdi, 0 ; exit code + syscall ; call exit() +``` + + +It assembles to a 456 byte binary with 39 bytes of code and 4 bytes of +data: + +```bash +$ make +nasm -f elf64 -o hi.o hi.s +ld -s -static -nostdinc -o hi hi.o +$ wc -c ./hi +456 ./hi +$ objdump -h ./hi +... +Sections: +Idx Name Size VMA LMA File off Algn + 0 .text 00000027 0000000000400080 0000000000400080 00000080 2**4 + CONTENTS, ALLOC, LOAD, READONLY, CODE + 1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2 + CONTENTS, ALLOC, LOAD, READONLY, DATA +$ objdump -d -Mintel ./hi +... +0000000000400080 <.text>: + 400080: b8 01 00 00 00 mov eax,0x1 + 400085: bf 01 00 00 00 mov edi,0x1 + 40008a: 48 be a8 00 40 00 00 movabs rsi,0x4000a8 + 400091: 00 00 00 + 400094: ba 04 00 00 00 mov edx,0x4 + 400099: 0f 05 syscall + 40009b: b8 3c 00 00 00 mov eax,0x3c + 4000a0: bf 00 00 00 00 mov edi,0x0 + 4000a5: 0f 05 syscall +``` + + +First, we replace all the bloated 5 byte 32-bit and 64-bit instructions +with slimmer 16-bit and 32-bit equivalents: + +```diff +diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s +index 9d17cab..3694091 100644 +--- a/src/asm-naive/hi.s ++++ b/src/asm-naive/hi.s +@@ -14,12 +14,12 @@ section .rodata + section .text + + _start: +- mov rax, 1 ; write +- mov rdi, 1 ; fd ++ inc al ; write ++ inc edi; fd + mov rsi, hi ; msg +- mov rdx, len ; len ++ mov dl, len ; len + syscall ; call write() + +- mov rax, 60 ; exit +- mov rdi, 0 ; exit code ++ mov al, 60 ; exit ++ xor edi, edi ; exit code + syscall ; call exit() +``` + + +**Notes:** +* `inc al` works because Linux zeros registers on process init. +* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The + other candidates (`inc dil`, `inc di`, `mov dil, al`, and + `mov di, ax`) are all 3 bytes. +* `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`, + `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and + `xor rdi, rdi`) are all 3-5 bytes. + +These changes shrink our binary to 440 bytes, with 24 bytes of code and +4 bytes of data: + +```bash +$ make +nasm -f elf64 -o hi.o hi.s +ld -s -static -nostdinc -o hi hi.o +$ wc -c ./hi +440 ./hi +$ objdump -hd -Mintel ./hi +... +Sections: +Idx Name Size VMA LMA File off Algn + 0 .text 00000018 0000000000400080 0000000000400080 00000080 2**4 + CONTENTS, ALLOC, LOAD, READONLY, CODE + 1 .rodata 00000004 0000000000400098 0000000000400098 00000098 2**2 + CONTENTS, ALLOC, LOAD, READONLY, DATA + +Disassembly of section .text: + +0000000000400080 <.text>: + 400080: fe c0 inc al + 400082: ff c7 inc edi + 400084: 48 be 98 00 40 00 00 movabs rsi,0x400098 + 40008b: 00 00 00 + 40008e: b2 04 mov dl,0x4 + 400090: 0f 05 syscall + 400092: b0 3c mov al,0x3c + 400094: 31 ff xor edi,edi + 400096: 0f 05 syscall +``` + + +The code is now 24 bytes, but at ~41% of our code that 10 byte +`mov` sticks out like a sore thumb. + +We can drop another 2 bytes of code and 4 bytes of data by doing the +following: + +1. Remove `mov rsi, str` (-10 bytes, good riddance). +2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section + overhead). +3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5 + bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus + one byte for `push`. +4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer. + +The stack isn't 16-byte aligned, but `syscall` doesn't seem to mind. +Here's the result: + +```nasm +bits 64 + +; "hi!\n", encoded as 32-bit little-endian int +str: equ 0x0a216968 + +section .text +global _start +_start: + push dword str ; push str (68 68 69 21 0a) + + inc al ; write() (fe c0) + inc edi ; fd (ff c7) + mov rsi, rsp ; msg (48 89 e6) + mov dl, 4 ; len (b2 04) + syscall ; call write() (0f 05) + + mov al, 60 ; exit() (b0 3c) + xor edi, edi ; exit code (31 ff) + syscall ; call exit() (0f 05) +``` + + +This results in a 360 byte binary with 22 bytes of code and no data +section: + +```bash +$ make +nasm -f elf64 -o hi.o hi.s +ld -s -static -nostdinc -o hi hi.o +$ ./hi +hi! +$ wc -c ./hi +360 ./hi +$ objdump -h ./hi +... +Sections: +Idx Name Size VMA LMA File off Algn + 0 .text 00000016 0000000000400080 0000000000400080 00000080 2**4 + CONTENTS, ALLOC, LOAD, READONLY, CODE +``` + + +This is the smallest *legitimate* assembly implementation that I could +cook up. It's available in the [companion GitHub repository][repo] and +included in the [results][tb] as [`asm-opt`][asm-opt]. + +### Dirty Tricks + +In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness +created a 114 byte static [x86-64][] [Linux][] binary by overlapping +portions of the [ELF header][] with the [program header][ph], then +embedding the code in unverified\* gaps of the [ELF header][]. + +(\* Unverified by [Linux][], that is. Junk in these fields causes +`readelf` and `objdump` give these binaries the stink eye). + +He's also got a handy table showing exactly which [ELF header][] bytes +can be safely abused. In particular, there are two 12 byte regions at +offsets `4` and `40` which could store our 22 bytes of code. + +I reordered the code and divided it into into two chunks: + +* `code_0`: First 10 bytes, plus a two byte jump to `code_1` +* `code_1`: Remaining 12 bytes. + +Here's the [assembly][] for the two chunks: + +```nasm +; entry point +; (10 bytes of code plus a 2 byte jump to code_1) +code_0: + push dword str ; push string onto stack (68 68 69 21 0a) + inc al ; write() (fe c0) + mov rsi, rsp ; str (48 89 e6) + jmp code_1 ; jump to next chunk (eb 18) + +; ... (mandatory ELF stuff omitted for brevity) + +; second code chunk +; (12 bytes) +code_1: + inc edi ; fd (89 c7) + mov dl, 4 ; len (b2 04) + syscall ; call write() (0f 05) + + mov al, 60 ; exit() (b0 3c) + xor edi, edi ; 0 exit code (31 ff) + syscall ; call exit() (0f 05) +``` + + +I tried shrinking Nathan's binary in a couple other places (for example, +the 8 bytes of padding at the end of the file), but if you remove any +more padding then [Linux][] refuses to execute the binary. + +With these changes the final binary size is 114 bytes. [Linux][] will +still happily execute the binary, but `readelf`, `objdump`, and `file` +can't make any sense of it: + +```bash +$ make +nasm -f bin -o hi hi.s +chmod a+x hi +$ ./hi +hi! +$ wc -c ./hi +114 ./hi +$ objdump -hdMintel ./hi +objdump: ./hi: file format not recognized +$ readelf -SW ./hi +There are 65329 section headers, starting at offset 0x3a: +readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header +readelf: Error: Reading 1014951344 bytes extends past end of file for section headers +readelf: Error: Too many program headers - 0x50f - the file is not that big +$ file ./hi +./hi: ELF, unknown class 104 +$ hd ./hi +00000000 7f 45 4c 46 68 68 69 21 0a fe c0 48 89 e6 eb 18 |.ELFhhi!...H....| +00000010 02 00 3e 00 01 00 00 00 04 80 02 00 00 00 00 00 |..>.............| +00000020 3a 00 00 00 00 00 00 00 ff c7 b2 04 0f 05 b0 3c |:..............<| +00000030 31 ff 0f 05 40 00 38 00 01 00 01 00 00 00 05 00 |1...@.8.........| +00000040 00 00 00 00 00 00 00 00 00 00 00 80 02 00 00 00 |................| +00000050 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 |..........r.....| +00000060 00 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00 |..r.............| +00000070 00 00 |..| +00000072 +``` + + +It'll even run as a [Docker][] image: + +```bash +$ cat Dockerfile +FROM scratch +COPY ./hi /hi +ENTRYPOINT ["/hi"] +$ docker build -t zoinks . +Sending build context to Docker daemon 8.704kB +Step 1/3 : FROM scratch + ---> +Step 2/3 : COPY ./hi /hi + ---> 336acd7e2d94 +Step 3/3 : ENTRYPOINT ["/hi"] + ---> Running in e20eca61de44 +Removing intermediate container e20eca61de44 + ---> 297e5d7db5f8 +Successfully built 297e5d7db5f8 +Successfully tagged zoinks:latest +$ docker images zoinks --format '{{.Size}}' +114B +$ docker run --rm -it zoinks +hi! +$ +``` + + + +This glorious monstrosity is included in the [companion +repository][repo] in [the `src/asm-elf-1` directory][asm-elf], and shown +in the [Tiny Binary results][tb] as `asm-elf`. + +### Links + +If you enjoyed this post, you may also like: + +* [Tiny Binaries Repository][repo]: Companion repository with + source code, build instructions, and additional details for this post. +* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][] + [Linux][] binaries, including the table of unverified [ELF header][] + bytes that I used. +* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for + Linux][tiny-elf-orig]: Classic original article on tiny static 32-bit + binaries. +* [My Own Private Binary][]: Sequel to [A Whirlwind Tutorial on + Creating Really Teensy ELF Executables for Linux][tiny-elf-orig] where + the author creates a 0 byte executable using a kernel module. + +[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}} + "Tiny Binaries" +[assembly]: https://en.wikipedia.org/wiki/Assembly_language + "Assembly language" +[asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive + "Unoptimized assembly implementation." +[asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt + "Optimized assembly implementation." +[asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1 + "Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped." +[repo]: https://github.com/pablotron/tiny-binaries + "Tiny Binaries GitHub repository" +[tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html + "Tiny ELF Files: Revisited in 2021" +[x86-64]: https://en.wikipedia.org/wiki/X86-64 + "64-bit version of x86 instruction set" +[linux]: https://en.wikipedia.org/wiki/Linux + "Linux operating system" +[elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format + "Executable and Linkable Format" +[ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header + "ELF Program Header" +[elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header + "ELF file header" +[tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html + "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" +[my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html + "My Own Private Binary" +[docker]: https://en.wikipedia.org/wiki/Docker_(software) + "Docker" |