aboutsummaryrefslogtreecommitdiff
path: root/content
diff options
context:
space:
mode:
authorPaul Duncan <pabs@pablotron.org>2022-01-01 14:03:09 -0500
committerPaul Duncan <pabs@pablotron.org>2022-01-01 14:03:09 -0500
commit1c25fe88577dbe9bbb9593f1141d3fae1f20c31f (patch)
tree54ea09e8eba21178768b4c853e12e5aadd7f6106 /content
parent956cbe7f6c2c7936fe2efe5e1e6c4ba25d5bf928 (diff)
downloadpablotron.org-1c25fe88577dbe9bbb9593f1141d3fae1f20c31f.tar.bz2
pablotron.org-1c25fe88577dbe9bbb9593f1141d3fae1f20c31f.zip
add posts/2022-01-01-tiny-binaries-assembly-optimization.md
Diffstat (limited to 'content')
-rw-r--r--content/posts/2022-01-01-tiny-binaries-assembly-optimization.md373
1 files changed, 373 insertions, 0 deletions
diff --git a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
new file mode 100644
index 0000000..6717a7b
--- /dev/null
+++ b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
@@ -0,0 +1,373 @@
+---
+slug: tiny-binaries-assembly-optimization
+title: "Tiny Binaries: Assembly Optimization"
+date: "2022-01-01T08:22:07-04:00"
+---
+
+Here's how I reduced the size of [the Assembly
+implementation][asm-naive] in [Tiny Binaries][tb] from 456 bytes to 360
+bytes with optimizations, and then reduced the size from 360 bytes to
+114 bytes with dirty tricks.
+
+### Shrinking the Code
+
+Here's the code for the [unoptimized Assembly implementation
+(`asm-naive`)][asm-naive]:
+
+```nasm
+;
+; hi.s: unoptimized linux x86-64 assembly implementation.
+;
+
+bits 64
+
+global _start
+
+section .rodata
+ ; "hi!\n"
+ hi db "hi!", 10
+ len equ $ - hi
+
+section .text
+
+_start:
+ mov rax, 1 ; write
+ mov rdi, 1 ; fd
+ mov rsi, hi ; msg
+ mov rdx, len ; len
+ syscall ; call write()
+
+ mov rax, 60 ; exit
+ mov rdi, 0 ; exit code
+ syscall ; call exit()
+```
+&nbsp;
+
+It assembles to a 456 byte binary with 39 bytes of code and 4 bytes of
+data:
+
+```bash
+$ make
+nasm -f elf64 -o hi.o hi.s
+ld -s -static -nostdinc -o hi hi.o
+$ wc -c ./hi
+456 ./hi
+$ objdump -h ./hi
+...
+Sections:
+Idx Name Size VMA LMA File off Algn
+ 0 .text 00000027 0000000000400080 0000000000400080 00000080 2**4
+ CONTENTS, ALLOC, LOAD, READONLY, CODE
+ 1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2
+ CONTENTS, ALLOC, LOAD, READONLY, DATA
+$ objdump -d -Mintel ./hi
+...
+0000000000400080 <.text>:
+ 400080: b8 01 00 00 00 mov eax,0x1
+ 400085: bf 01 00 00 00 mov edi,0x1
+ 40008a: 48 be a8 00 40 00 00 movabs rsi,0x4000a8
+ 400091: 00 00 00
+ 400094: ba 04 00 00 00 mov edx,0x4
+ 400099: 0f 05 syscall
+ 40009b: b8 3c 00 00 00 mov eax,0x3c
+ 4000a0: bf 00 00 00 00 mov edi,0x0
+ 4000a5: 0f 05 syscall
+```
+&nbsp;
+
+First, we replace all the bloated 5 byte 32-bit and 64-bit instructions
+with slimmer 16-bit and 32-bit equivalents:
+
+```diff
+diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
+index 9d17cab..3694091 100644
+--- a/src/asm-naive/hi.s
++++ b/src/asm-naive/hi.s
+@@ -14,12 +14,12 @@ section .rodata
+ section .text
+
+ _start:
+- mov rax, 1 ; write
+- mov rdi, 1 ; fd
++ inc al ; write
++ inc edi; fd
+ mov rsi, hi ; msg
+- mov rdx, len ; len
++ mov dl, len ; len
+ syscall ; call write()
+
+- mov rax, 60 ; exit
+- mov rdi, 0 ; exit code
++ mov al, 60 ; exit
++ xor edi, edi ; exit code
+ syscall ; call exit()
+```
+&nbsp;
+
+**Notes:**
+* `inc al` works because Linux zeros registers on process init.
+* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The
+ other candidates (`inc dil`, `inc di`, `mov dil, al`, and
+ `mov di, ax`) are all 3 bytes.
+* `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`,
+ `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and
+ `xor rdi, rdi`) are all 3-5 bytes.
+
+These changes shrink our binary to 440 bytes, with 24 bytes of code and
+4 bytes of data:
+
+```bash
+$ make
+nasm -f elf64 -o hi.o hi.s
+ld -s -static -nostdinc -o hi hi.o
+$ wc -c ./hi
+440 ./hi
+$ objdump -hd -Mintel ./hi
+...
+Sections:
+Idx Name Size VMA LMA File off Algn
+ 0 .text 00000018 0000000000400080 0000000000400080 00000080 2**4
+ CONTENTS, ALLOC, LOAD, READONLY, CODE
+ 1 .rodata 00000004 0000000000400098 0000000000400098 00000098 2**2
+ CONTENTS, ALLOC, LOAD, READONLY, DATA
+
+Disassembly of section .text:
+
+0000000000400080 <.text>:
+ 400080: fe c0 inc al
+ 400082: ff c7 inc edi
+ 400084: 48 be 98 00 40 00 00 movabs rsi,0x400098
+ 40008b: 00 00 00
+ 40008e: b2 04 mov dl,0x4
+ 400090: 0f 05 syscall
+ 400092: b0 3c mov al,0x3c
+ 400094: 31 ff xor edi,edi
+ 400096: 0f 05 syscall
+```
+&nbsp;
+
+The code is now 24 bytes, but at ~41% of our code that 10 byte
+`mov` sticks out like a sore thumb.
+
+We can drop another 2 bytes of code and 4 bytes of data by doing the
+following:
+
+1. Remove `mov rsi, str` (-10 bytes, good riddance).
+2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section
+ overhead).
+3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5
+ bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus
+ one byte for `push`.
+4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer.
+
+The stack isn't 16-byte aligned, but `syscall` doesn't seem to mind.
+Here's the result:
+
+```nasm
+bits 64
+
+; "hi!\n", encoded as 32-bit little-endian int
+str: equ 0x0a216968
+
+section .text
+global _start
+_start:
+ push dword str ; push str (68 68 69 21 0a)
+
+ inc al ; write() (fe c0)
+ inc edi ; fd (ff c7)
+ mov rsi, rsp ; msg (48 89 e6)
+ mov dl, 4 ; len (b2 04)
+ syscall ; call write() (0f 05)
+
+ mov al, 60 ; exit() (b0 3c)
+ xor edi, edi ; exit code (31 ff)
+ syscall ; call exit() (0f 05)
+```
+&nbsp;
+
+This results in a 360 byte binary with 22 bytes of code and no data
+section:
+
+```bash
+$ make
+nasm -f elf64 -o hi.o hi.s
+ld -s -static -nostdinc -o hi hi.o
+$ ./hi
+hi!
+$ wc -c ./hi
+360 ./hi
+$ objdump -h ./hi
+...
+Sections:
+Idx Name Size VMA LMA File off Algn
+ 0 .text 00000016 0000000000400080 0000000000400080 00000080 2**4
+ CONTENTS, ALLOC, LOAD, READONLY, CODE
+```
+&nbsp;
+
+This is the smallest *legitimate* assembly implementation that I could
+cook up. It's available in the [companion GitHub repository][repo] and
+included in the [results][tb] as [`asm-opt`][asm-opt].
+
+### Dirty Tricks
+
+In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness
+created a 114 byte static [x86-64][] [Linux][] binary by overlapping
+portions of the [ELF header][] with the [program header][ph], then
+embedding the code in unverified\* gaps of the [ELF header][].
+
+(\* Unverified by [Linux][], that is. Junk in these fields causes
+`readelf` and `objdump` give these binaries the stink eye).
+
+He's also got a handy table showing exactly which [ELF header][] bytes
+can be safely abused. In particular, there are two 12 byte regions at
+offsets `4` and `40` which could store our 22 bytes of code.
+
+I reordered the code and divided it into into two chunks:
+
+* `code_0`: First 10 bytes, plus a two byte jump to `code_1`
+* `code_1`: Remaining 12 bytes.
+
+Here's the [assembly][] for the two chunks:
+
+```nasm
+; entry point
+; (10 bytes of code plus a 2 byte jump to code_1)
+code_0:
+ push dword str ; push string onto stack (68 68 69 21 0a)
+ inc al ; write() (fe c0)
+ mov rsi, rsp ; str (48 89 e6)
+ jmp code_1 ; jump to next chunk (eb 18)
+
+; ... (mandatory ELF stuff omitted for brevity)
+
+; second code chunk
+; (12 bytes)
+code_1:
+ inc edi ; fd (89 c7)
+ mov dl, 4 ; len (b2 04)
+ syscall ; call write() (0f 05)
+
+ mov al, 60 ; exit() (b0 3c)
+ xor edi, edi ; 0 exit code (31 ff)
+ syscall ; call exit() (0f 05)
+```
+&nbsp;
+
+I tried shrinking Nathan's binary in a couple other places (for example,
+the 8 bytes of padding at the end of the file), but if you remove any
+more padding then [Linux][] refuses to execute the binary.
+
+With these changes the final binary size is 114 bytes. [Linux][] will
+still happily execute the binary, but `readelf`, `objdump`, and `file`
+can't make any sense of it:
+
+```bash
+$ make
+nasm -f bin -o hi hi.s
+chmod a+x hi
+$ ./hi
+hi!
+$ wc -c ./hi
+114 ./hi
+$ objdump -hdMintel ./hi
+objdump: ./hi: file format not recognized
+$ readelf -SW ./hi
+There are 65329 section headers, starting at offset 0x3a:
+readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header
+readelf: Error: Reading 1014951344 bytes extends past end of file for section headers
+readelf: Error: Too many program headers - 0x50f - the file is not that big
+$ file ./hi
+./hi: ELF, unknown class 104
+$ hd ./hi
+00000000 7f 45 4c 46 68 68 69 21 0a fe c0 48 89 e6 eb 18 |.ELFhhi!...H....|
+00000010 02 00 3e 00 01 00 00 00 04 80 02 00 00 00 00 00 |..>.............|
+00000020 3a 00 00 00 00 00 00 00 ff c7 b2 04 0f 05 b0 3c |:..............<|
+00000030 31 ff 0f 05 40 00 38 00 01 00 01 00 00 00 05 00 |1...@.8.........|
+00000040 00 00 00 00 00 00 00 00 00 00 00 80 02 00 00 00 |................|
+00000050 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 |..........r.....|
+00000060 00 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00 |..r.............|
+00000070 00 00 |..|
+00000072
+```
+&nbsp;
+
+It'll even run as a [Docker][] image:
+
+```bash
+$ cat Dockerfile
+FROM scratch
+COPY ./hi /hi
+ENTRYPOINT ["/hi"]
+$ docker build -t zoinks .
+Sending build context to Docker daemon 8.704kB
+Step 1/3 : FROM scratch
+ --->
+Step 2/3 : COPY ./hi /hi
+ ---> 336acd7e2d94
+Step 3/3 : ENTRYPOINT ["/hi"]
+ ---> Running in e20eca61de44
+Removing intermediate container e20eca61de44
+ ---> 297e5d7db5f8
+Successfully built 297e5d7db5f8
+Successfully tagged zoinks:latest
+$ docker images zoinks --format '{{.Size}}'
+114B
+$ docker run --rm -it zoinks
+hi!
+$
+```
+&nbsp;
+
+
+This glorious monstrosity is included in the [companion
+repository][repo] in [the `src/asm-elf-1` directory][asm-elf], and shown
+in the [Tiny Binary results][tb] as `asm-elf`.
+
+### Links
+
+If you enjoyed this post, you may also like:
+
+* [Tiny Binaries Repository][repo]: Companion repository with
+ source code, build instructions, and additional details for this post.
+* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][]
+ [Linux][] binaries, including the table of unverified [ELF header][]
+ bytes that I used.
+* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for
+ Linux][tiny-elf-orig]: Classic original article on tiny static 32-bit
+ binaries.
+* [My Own Private Binary][]: Sequel to [A Whirlwind Tutorial on
+ Creating Really Teensy ELF Executables for Linux][tiny-elf-orig] where
+ the author creates a 0 byte executable using a kernel module.
+
+[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}}
+ "Tiny Binaries"
+[assembly]: https://en.wikipedia.org/wiki/Assembly_language
+ "Assembly language"
+[asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive
+ "Unoptimized assembly implementation."
+[asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt
+ "Optimized assembly implementation."
+[asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1
+ "Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped."
+[repo]: https://github.com/pablotron/tiny-binaries
+ "Tiny Binaries GitHub repository"
+[tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html
+ "Tiny ELF Files: Revisited in 2021"
+[x86-64]: https://en.wikipedia.org/wiki/X86-64
+ "64-bit version of x86 instruction set"
+[linux]: https://en.wikipedia.org/wiki/Linux
+ "Linux operating system"
+[elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
+ "Executable and Linkable Format"
+[ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header
+ "ELF Program Header"
+[elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
+ "ELF file header"
+[tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
+ "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux"
+[my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
+ "My Own Private Binary"
+[docker]: https://en.wikipedia.org/wiki/Docker_(software)
+ "Docker"