aboutsummaryrefslogtreecommitdiff
path: root/content/posts
diff options
context:
space:
mode:
authorPaul Duncan <pabs@pablotron.org>2022-01-02 10:01:50 -0500
committerPaul Duncan <pabs@pablotron.org>2022-01-02 10:01:50 -0500
commita46dd1f4cd45869b731a4690ca2971462f53abd1 (patch)
tree6e622ec5e584fc81d59d43c7c3b9257fd03de52a /content/posts
parent2ac84e5884af7ffaa4b90813978635f6285c97d5 (diff)
downloadpablotron.org-a46dd1f4cd45869b731a4690ca2971462f53abd1.tar.bz2
pablotron.org-a46dd1f4cd45869b731a4690ca2971462f53abd1.zip
posts/2022-01-01-tiny-binaries-assembly-optimization.md: be more concise
Diffstat (limited to 'content/posts')
-rw-r--r--content/posts/2022-01-01-tiny-binaries-assembly-optimization.md79
1 files changed, 39 insertions, 40 deletions
diff --git a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
index 6717a7b..be4815c 100644
--- a/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
+++ b/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
@@ -4,15 +4,13 @@ title: "Tiny Binaries: Assembly Optimization"
date: "2022-01-01T08:22:07-04:00"
---
-Here's how I reduced the size of [the Assembly
-implementation][asm-naive] in [Tiny Binaries][tb] from 456 bytes to 360
-bytes with optimizations, and then reduced the size from 360 bytes to
-114 bytes with dirty tricks.
+Here's how I reduced the [assembly][asm-naive] binary size in [Tiny
+Binaries][tb] from 456 bytes to 114 bytes.
### Shrinking the Code
-Here's the code for the [unoptimized Assembly implementation
-(`asm-naive`)][asm-naive]:
+Below is the [original assembly code][asm-naive] (`asm-naive` in the
+[results][tb]):
```nasm
;
@@ -43,7 +41,7 @@ _start:
```
&nbsp;
-It assembles to a 456 byte binary with 39 bytes of code and 4 bytes of
+This produces a 456 byte binary with 39 bytes of code and 4 bytes of
data:
```bash
@@ -52,7 +50,7 @@ nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
456 ./hi
-$ objdump -h ./hi
+$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
@@ -60,8 +58,9 @@ Idx Name Size VMA LMA File off Algn
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
-$ objdump -d -Mintel ./hi
-...
+
+Disassembly of section .text:
+
0000000000400080 <.text>:
400080: b8 01 00 00 00 mov eax,0x1
400085: bf 01 00 00 00 mov edi,0x1
@@ -75,8 +74,8 @@ $ objdump -d -Mintel ./hi
```
&nbsp;
-First, we replace all the bloated 5 byte 32-bit and 64-bit instructions
-with slimmer 16-bit and 32-bit equivalents:
+First, we replace the unnecessary 5 byte instructions with smaller
+equivalents:
```diff
diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
@@ -105,7 +104,7 @@ index 9d17cab..3694091 100644
&nbsp;
**Notes:**
-* `inc al` works because Linux zeros registers on process init.
+* `inc al` works because [Linux][] zeros registers on process init.
* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The
other candidates (`inc dil`, `inc di`, `mov dil, al`, and
`mov di, ax`) are all 3 bytes.
@@ -113,8 +112,8 @@ index 9d17cab..3694091 100644
`mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and
`xor rdi, rdi`) are all 3-5 bytes.
-These changes shrink our binary to 440 bytes, with 24 bytes of code and
-4 bytes of data:
+These changes shrink the binary size to 440 bytes, with 24 bytes of code
+and 4 bytes of data:
```bash
$ make
@@ -146,11 +145,10 @@ Disassembly of section .text:
```
&nbsp;
-The code is now 24 bytes, but at ~41% of our code that 10 byte
-`mov` sticks out like a sore thumb.
+The code is now 24 bytes, of which 10 are one large `mov` instruction.
-We can drop another 2 bytes of code and 4 bytes of data by doing the
-following:
+We can drop 2 bytes of code, 4 bytes of data, and the `.rodata` section
+by doing the following:
1. Remove `mov rsi, str` (-10 bytes, good riddance).
2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section
@@ -160,7 +158,6 @@ following:
one byte for `push`.
4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer.
-The stack isn't 16-byte aligned, but `syscall` doesn't seem to mind.
Here's the result:
```nasm
@@ -186,7 +183,7 @@ _start:
```
&nbsp;
-This results in a 360 byte binary with 22 bytes of code and no data
+This produces a 360 byte binary with 22 bytes of code and no data
section:
```bash
@@ -208,7 +205,7 @@ Idx Name Size VMA LMA File off Algn
This is the smallest *legitimate* assembly implementation that I could
cook up. It's available in the [companion GitHub repository][repo] and
-included in the [results][tb] as [`asm-opt`][asm-opt].
+shown in [the results][tb] as `asm-opt`.
### Dirty Tricks
@@ -218,11 +215,13 @@ portions of the [ELF header][] with the [program header][ph], then
embedding the code in unverified\* gaps of the [ELF header][].
(\* Unverified by [Linux][], that is. Junk in these fields causes
-`readelf` and `objdump` give these binaries the stink eye).
+`readelf` and `objdump` give these binaries the stink eye, as we'll see
+shortly).
-He's also got a handy table showing exactly which [ELF header][] bytes
-can be safely abused. In particular, there are two 12 byte regions at
-offsets `4` and `40` which could store our 22 bytes of code.
+Nathan also created a handy table showing which [ELF header][] bytes are
+unverified by [Linux][]. In particular, there are two unverified 12
+byte regions at offsets `4` and `40` which could store our 22 bytes of
+code.
I reordered the code and divided it into into two chunks:
@@ -240,7 +239,7 @@ code_0:
mov rsi, rsp ; str (48 89 e6)
jmp code_1 ; jump to next chunk (eb 18)
-; ... (mandatory ELF stuff omitted for brevity)
+; ...
; second code chunk
; (12 bytes)
@@ -255,13 +254,13 @@ code_1:
```
&nbsp;
-I tried shrinking Nathan's binary in a couple other places (for example,
-the 8 bytes of padding at the end of the file), but if you remove any
-more padding then [Linux][] refuses to execute the binary.
+I tried shrinking Nathan's binary a couple of ways, without any luck.
+For example, I tried removing the padding bytes at the end of the file,
+but the binary will not execute without them.
-With these changes the final binary size is 114 bytes. [Linux][] will
-still happily execute the binary, but `readelf`, `objdump`, and `file`
-can't make any sense of it:
+Anyway, with these changes the final binary size is 114 bytes.
+[Linux][] will still happily execute the binary, but common tools like
+`readelf`, `objdump`, and `file` can't make any sense of it:
```bash
$ make
@@ -271,7 +270,7 @@ $ ./hi
hi!
$ wc -c ./hi
114 ./hi
-$ objdump -hdMintel ./hi
+$ objdump -hd -Mintel ./hi
objdump: ./hi: file format not recognized
$ readelf -SW ./hi
There are 65329 section headers, starting at offset 0x3a:
@@ -320,10 +319,8 @@ $
```
&nbsp;
-
This glorious monstrosity is included in the [companion
-repository][repo] in [the `src/asm-elf-1` directory][asm-elf], and shown
-in the [Tiny Binary results][tb] as `asm-elf`.
+repository][repo] and shown in the [results][tb] as `asm-elf`.
### Links
@@ -332,15 +329,17 @@ If you enjoyed this post, you may also like:
* [Tiny Binaries Repository][repo]: Companion repository with
source code, build instructions, and additional details for this post.
* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][]
- [Linux][] binaries, including the table of unverified [ELF header][]
- bytes that I used.
+ [Linux][] binaries, including a table of unverified [ELF header][]
+ bytes.
* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for
- Linux][tiny-elf-orig]: Classic original article on tiny static 32-bit
+ Linux][tiny-elf-orig]: Classic original article on tiny 32-bit static
binaries.
* [My Own Private Binary][]: Sequel to [A Whirlwind Tutorial on
Creating Really Teensy ELF Executables for Linux][tiny-elf-orig] where
the author creates a 0 byte executable using a kernel module.
+**Update (2022-01-02):** Fix typos, dewordify, improve grammar.
+
[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}}
"Tiny Binaries"
[assembly]: https://en.wikipedia.org/wiki/Assembly_language