From a46dd1f4cd45869b731a4690ca2971462f53abd1 Mon Sep 17 00:00:00 2001 From: Paul Duncan Date: Sun, 2 Jan 2022 10:01:50 -0500 Subject: posts/2022-01-01-tiny-binaries-assembly-optimization.md: be more concise --- ...22-01-01-tiny-binaries-assembly-optimization.md | 79 +++++++++++----------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md index 6717a7b..be4815c 100644 --- a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md +++ b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md @@ -4,15 +4,13 @@ title: "Tiny Binaries: Assembly Optimization" date: "2022-01-01T08:22:07-04:00" --- -Here's how I reduced the size of [the Assembly -implementation][asm-naive] in [Tiny Binaries][tb] from 456 bytes to 360 -bytes with optimizations, and then reduced the size from 360 bytes to -114 bytes with dirty tricks. +Here's how I reduced the [assembly][asm-naive] binary size in [Tiny +Binaries][tb] from 456 bytes to 114 bytes. ### Shrinking the Code -Here's the code for the [unoptimized Assembly implementation -(`asm-naive`)][asm-naive]: +Below is the [original assembly code][asm-naive] (`asm-naive` in the +[results][tb]): ```nasm ; @@ -43,7 +41,7 @@ _start: ```   -It assembles to a 456 byte binary with 39 bytes of code and 4 bytes of +This produces a 456 byte binary with 39 bytes of code and 4 bytes of data: ```bash @@ -52,7 +50,7 @@ nasm -f elf64 -o hi.o hi.s ld -s -static -nostdinc -o hi hi.o $ wc -c ./hi 456 ./hi -$ objdump -h ./hi +$ objdump -hd -Mintel ./hi ... Sections: Idx Name Size VMA LMA File off Algn @@ -60,8 +58,9 @@ Idx Name Size VMA LMA File off Algn CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA -$ objdump -d -Mintel ./hi -... + +Disassembly of section .text: + 0000000000400080 <.text>: 400080: b8 01 00 00 00 mov eax,0x1 400085: bf 01 00 00 00 mov edi,0x1 @@ -75,8 +74,8 @@ $ objdump -d -Mintel ./hi ```   -First, we replace all the bloated 5 byte 32-bit and 64-bit instructions -with slimmer 16-bit and 32-bit equivalents: +First, we replace the unnecessary 5 byte instructions with smaller +equivalents: ```diff diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s @@ -105,7 +104,7 @@ index 9d17cab..3694091 100644   **Notes:** -* `inc al` works because Linux zeros registers on process init. +* `inc al` works because [Linux][] zeros registers on process init. * `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The other candidates (`inc dil`, `inc di`, `mov dil, al`, and `mov di, ax`) are all 3 bytes. @@ -113,8 +112,8 @@ index 9d17cab..3694091 100644 `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and `xor rdi, rdi`) are all 3-5 bytes. -These changes shrink our binary to 440 bytes, with 24 bytes of code and -4 bytes of data: +These changes shrink the binary size to 440 bytes, with 24 bytes of code +and 4 bytes of data: ```bash $ make @@ -146,11 +145,10 @@ Disassembly of section .text: ```   -The code is now 24 bytes, but at ~41% of our code that 10 byte -`mov` sticks out like a sore thumb. +The code is now 24 bytes, of which 10 are one large `mov` instruction. -We can drop another 2 bytes of code and 4 bytes of data by doing the -following: +We can drop 2 bytes of code, 4 bytes of data, and the `.rodata` section +by doing the following: 1. Remove `mov rsi, str` (-10 bytes, good riddance). 2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section @@ -160,7 +158,6 @@ following: one byte for `push`. 4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer. -The stack isn't 16-byte aligned, but `syscall` doesn't seem to mind. Here's the result: ```nasm @@ -186,7 +183,7 @@ _start: ```   -This results in a 360 byte binary with 22 bytes of code and no data +This produces a 360 byte binary with 22 bytes of code and no data section: ```bash @@ -208,7 +205,7 @@ Idx Name Size VMA LMA File off Algn This is the smallest *legitimate* assembly implementation that I could cook up. It's available in the [companion GitHub repository][repo] and -included in the [results][tb] as [`asm-opt`][asm-opt]. +shown in [the results][tb] as `asm-opt`. ### Dirty Tricks @@ -218,11 +215,13 @@ portions of the [ELF header][] with the [program header][ph], then embedding the code in unverified\* gaps of the [ELF header][]. (\* Unverified by [Linux][], that is. Junk in these fields causes -`readelf` and `objdump` give these binaries the stink eye). +`readelf` and `objdump` give these binaries the stink eye, as we'll see +shortly). -He's also got a handy table showing exactly which [ELF header][] bytes -can be safely abused. In particular, there are two 12 byte regions at -offsets `4` and `40` which could store our 22 bytes of code. +Nathan also created a handy table showing which [ELF header][] bytes are +unverified by [Linux][]. In particular, there are two unverified 12 +byte regions at offsets `4` and `40` which could store our 22 bytes of +code. I reordered the code and divided it into into two chunks: @@ -240,7 +239,7 @@ code_0: mov rsi, rsp ; str (48 89 e6) jmp code_1 ; jump to next chunk (eb 18) -; ... (mandatory ELF stuff omitted for brevity) +; ... ; second code chunk ; (12 bytes) @@ -255,13 +254,13 @@ code_1: ```   -I tried shrinking Nathan's binary in a couple other places (for example, -the 8 bytes of padding at the end of the file), but if you remove any -more padding then [Linux][] refuses to execute the binary. +I tried shrinking Nathan's binary a couple of ways, without any luck. +For example, I tried removing the padding bytes at the end of the file, +but the binary will not execute without them. -With these changes the final binary size is 114 bytes. [Linux][] will -still happily execute the binary, but `readelf`, `objdump`, and `file` -can't make any sense of it: +Anyway, with these changes the final binary size is 114 bytes. +[Linux][] will still happily execute the binary, but common tools like +`readelf`, `objdump`, and `file` can't make any sense of it: ```bash $ make @@ -271,7 +270,7 @@ $ ./hi hi! $ wc -c ./hi 114 ./hi -$ objdump -hdMintel ./hi +$ objdump -hd -Mintel ./hi objdump: ./hi: file format not recognized $ readelf -SW ./hi There are 65329 section headers, starting at offset 0x3a: @@ -320,10 +319,8 @@ $ ```   - This glorious monstrosity is included in the [companion -repository][repo] in [the `src/asm-elf-1` directory][asm-elf], and shown -in the [Tiny Binary results][tb] as `asm-elf`. +repository][repo] and shown in the [results][tb] as `asm-elf`. ### Links @@ -332,15 +329,17 @@ If you enjoyed this post, you may also like: * [Tiny Binaries Repository][repo]: Companion repository with source code, build instructions, and additional details for this post. * [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][] - [Linux][] binaries, including the table of unverified [ELF header][] - bytes that I used. + [Linux][] binaries, including a table of unverified [ELF header][] + bytes. * [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for - Linux][tiny-elf-orig]: Classic original article on tiny static 32-bit + Linux][tiny-elf-orig]: Classic original article on tiny 32-bit static binaries. * [My Own Private Binary][]: Sequel to [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux][tiny-elf-orig] where the author creates a 0 byte executable using a kernel module. +**Update (2022-01-02):** Fix typos, dewordify, improve grammar. + [tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}} "Tiny Binaries" [assembly]: https://en.wikipedia.org/wiki/Assembly_language -- cgit v1.2.3