aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2022-01-01-tiny-binaries-assembly-optimization.md
blob: a54b830d98ee40dad548fbd4c1988a762afca415 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
slug: tiny-binaries-assembly-optimization
title: "Tiny Binaries: Assembly Optimization"
date: "2022-01-01T08:22:07-04:00"
---

Here's how I reduced the [assembly][asm-naive] binary size in [Tiny
Binaries][tb] from 456 bytes to 114 bytes.

### Shrinking the Code

Below is the [original assembly code][asm-naive] (`asm-naive` in the
[results][tb]):

```nasm
;
; hi.s: unoptimized linux x86-64 assembly implementation.
;

bits 64

global _start

section .rodata
  ; "hi!\n"
  hi  db  "hi!", 10
  len equ $ - hi

section .text

_start:
  mov rax, 1 ; write
  mov rdi, 1 ; fd
  mov rsi, hi ; msg
  mov rdx, len ; len
  syscall ; call write()

  mov rax, 60 ; exit
  mov rdi, 0 ; exit code
  syscall ; call exit()
```
 

This produces a 456 byte binary with 39 bytes of code and 4 bytes of
data:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
456 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000027  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000004  00000000004000a8  00000000004000a8  000000a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Disassembly of section .text:

0000000000400080 <.text>:
  400080:	b8 01 00 00 00       	mov    eax,0x1
  400085:	bf 01 00 00 00       	mov    edi,0x1
  40008a:	48 be a8 00 40 00 00 	movabs rsi,0x4000a8
  400091:	00 00 00
  400094:	ba 04 00 00 00       	mov    edx,0x4
  400099:	0f 05                	syscall
  40009b:	b8 3c 00 00 00       	mov    eax,0x3c
  4000a0:	bf 00 00 00 00       	mov    edi,0x0
  4000a5:	0f 05                	syscall
```
&nbsp;

First, we replace the unnecessary 5 byte instructions with smaller
equivalents:

```diff
diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
index 9d17cab..3694091 100644
--- a/src/asm-naive/hi.s
+++ b/src/asm-naive/hi.s
@@ -14,12 +14,12 @@ section .rodata
 section .text

 _start:
-  mov rax, 1 ; write
-  mov rdi, 1 ; fd
+  inc al ; write
+  inc edi; fd
   mov rsi, hi ; msg
-  mov rdx, len ; len
+  mov dl, len ; len
   syscall ; call write()

-  mov rax, 60 ; exit
-  mov rdi, 0 ; exit code
+  mov al, 60 ; exit
+  xor edi, edi ; exit code
   syscall ; call exit()
```
&nbsp;

**Notes:**
* `inc al` works because [Linux][] zeros registers on process init.
* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`.  The
  other candidates (`inc dil`, `inc di`, `mov dil, al`, and
  `mov di, ax`) are all 3 bytes.
* `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`,
  `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and
  `xor rdi, rdi`) are all 3-5 bytes.

These changes shrink the binary size to 440 bytes, with 24 bytes of code
and 4 bytes of data:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
440 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000018  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000004  0000000000400098  0000000000400098  00000098  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Disassembly of section .text:

0000000000400080 <.text>:
  400080:	fe c0                	inc    al
  400082:	ff c7                	inc    edi
  400084:	48 be 98 00 40 00 00 	movabs rsi,0x400098
  40008b:	00 00 00
  40008e:	b2 04                	mov    dl,0x4
  400090:	0f 05                	syscall
  400092:	b0 3c                	mov    al,0x3c
  400094:	31 ff                	xor    edi,edi
  400096:	0f 05                	syscall
```
&nbsp;

The code is now 24 bytes, of which 10 are one large `mov` instruction.

We can drop 2 bytes of code, 4 bytes of data, and the `.rodata` section
by doing the following:

1. Remove `mov rsi, str` (-10 bytes, good riddance).
2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section
   overhead).
3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5
   bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus
   one byte for `push`.
4. Copy `rsp` to `rsi` (+3 bytes).  This gives `write` a valid pointer.

Here's the result:

```nasm
bits 64

; "hi!\n", encoded as 32-bit little-endian int
str: equ 0x0a216968

section .text
global _start
_start:
  push dword str  ; push str (68 68 69 21 0a)

  inc al          ; write() (fe c0)
  inc edi         ; fd (ff c7)
  mov rsi, rsp    ; msg (48 89 e6)
  mov dl, 4       ; len (b2 04)
  syscall         ; call write() (0f 05)

  mov al, 60      ; exit() (b0 3c)
  xor edi, edi    ; exit code (31 ff)
  syscall         ; call exit() (0f 05)
```
&nbsp;

This produces a 360 byte binary with 22 bytes of code and no data
section:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ ./hi
hi!
$ wc -c ./hi
360 ./hi
$ objdump -h ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000016  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
```
&nbsp;

This is the smallest *legitimate* assembly implementation that I could
cook up. It's available in the [companion GitHub repository][repo] and
shown in [the results][tb] as `asm-opt`.

### Dirty Tricks

In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness
created a 114 byte static [x86-64][] [Linux][] binary by overlapping
portions of the [ELF header][] with the [program header][ph], then
embedding the code in unverified\* gaps of the [ELF header][].

(\* Unverified by [Linux][], that is.  Junk in these fields causes
`readelf` and `objdump` give these binaries the stink eye, as we'll see
shortly).

Nathan also created a handy table showing which [ELF header][] bytes are
unverified by [Linux][].  In particular, there are two unverified 12
byte regions at offsets `4` and `40` which could store our 22 bytes of
code.

I reordered the code and divided it into into two chunks:

* `code_0`: First 10 bytes, plus a two byte jump to `code_1`
* `code_1`: Remaining 12 bytes.

Here's the [assembly][] for the two chunks:

```nasm
; entry point
; (10 bytes of code plus a 2 byte jump to code_1)
code_0:
  push dword str  ; push string onto stack (68 68 69 21 0a)
  inc al          ; write() (fe c0)
  mov rsi, rsp    ; str (48 89 e6)
  jmp code_1      ; jump to next chunk (eb 18)

; ...

; second code chunk
; (12 bytes)
code_1:
  inc edi         ; fd (89 c7)
  mov dl, 4       ; len (b2 04)
  syscall         ; call write() (0f 05)

  mov al, 60      ; exit() (b0 3c)
  xor edi, edi    ; 0 exit code (31 ff)
  syscall         ; call exit() (0f 05)
```
&nbsp;

These changes shrink the binary to 114 bytes.  [Linux][] will still
happily execute the binary, but `readelf`, `objdump`, and `file` can't
make sense of it:

```bash
$ make
nasm -f bin -o hi hi.s
chmod a+x hi
$ ./hi
hi!
$ wc -c ./hi
114 ./hi
$ objdump -hd -Mintel ./hi
objdump: ./hi: file format not recognized
$ readelf -SW ./hi
There are 65329 section headers, starting at offset 0x3a:
readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header
readelf: Error: Reading 1014951344 bytes extends past end of file for section headers
readelf: Error: Too many program headers - 0x50f - the file is not that big
$ file ./hi
./hi: ELF, unknown class 104
$ hd ./hi
00000000  7f 45 4c 46 68 68 69 21  0a fe c0 48 89 e6 eb 18  |.ELFhhi!...H....|
00000010  02 00 3e 00 01 00 00 00  04 80 02 00 00 00 00 00  |..>.............|
00000020  3a 00 00 00 00 00 00 00  ff c7 b2 04 0f 05 b0 3c  |:..............<|
00000030  31 ff 0f 05 40 00 38 00  01 00 01 00 00 00 05 00  |1...@.8.........|
00000040  00 00 00 00 00 00 00 00  00 00 00 80 02 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 72 00 00 00 00 00  |..........r.....|
00000060  00 00 72 00 00 00 00 00  00 00 00 00 00 00 00 00  |..r.............|
00000070  00 00                                             |..|
00000072
```
&nbsp;

It'll even run as a [Docker][] image:

```bash
$ cat Dockerfile
FROM scratch
COPY ./hi /hi
ENTRYPOINT ["/hi"]
$ docker build -t zoinks .
Sending build context to Docker daemon  8.704kB
Step 1/3 : FROM scratch
 --->
Step 2/3 : COPY ./hi /hi
 ---> 336acd7e2d94
Step 3/3 : ENTRYPOINT ["/hi"]
 ---> Running in e20eca61de44
Removing intermediate container e20eca61de44
 ---> 297e5d7db5f8
Successfully built 297e5d7db5f8
Successfully tagged zoinks:latest
$ docker images zoinks --format '{{.Size}}'
114B
$ docker run --rm -it zoinks
hi!
$
```
&nbsp;

This glorious monstrosity is included in the [companion
repository][repo] and shown in the [results][tb] as `asm-elf`.

### Links

If you enjoyed this post, you may also like:

* [Tiny Binaries Repository][repo]: Companion repository with
  source code, build instructions, and additional details for this post.
* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][]
  [Linux][] binaries, including a table of unverified [ELF header][]
  bytes.
* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for
  Linux][tiny-elf-orig]: Classic article on tiny 32-bit static binaries.
* [My Own Private Binary][]: Sequel to [A Whirlwind
  Tutorial][tiny-elf-orig] where the author creates a 0 byte executable
  using a kernel module.

**Update (2022-01-02):** Shorten, fix typos, improve grammar.

[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}}
  "Tiny Binaries"
[assembly]: https://en.wikipedia.org/wiki/Assembly_language
  "Assembly language"
[asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive
  "Unoptimized assembly implementation."
[asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt
  "Optimized assembly implementation."
[asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1
  "Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped."
[repo]: https://github.com/pablotron/tiny-binaries
  "Tiny Binaries GitHub repository"
[tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html
  "Tiny ELF Files: Revisited in 2021"
[x86-64]: https://en.wikipedia.org/wiki/X86-64
  "64-bit version of x86 instruction set"
[linux]: https://en.wikipedia.org/wiki/Linux
  "Linux operating system"
[elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
  "Executable and Linkable Format"
[ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header
  "ELF Program Header"
[elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
  "ELF file header"
[tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
  "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux"
[my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
  "My Own Private Binary"
[docker]: https://en.wikipedia.org/wiki/Docker_(software)
  "Docker"
[eof]: https://en.wikipedia.org/wiki/End-of-file
  "End of File"