1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
|
---
slug: tiny-binaries-assembly-optimization
title: "Tiny Binaries: Assembly Optimization"
date: "2022-01-01T08:22:07-04:00"
---
Here's how I reduced the [assembly][asm-naive] binary size in [Tiny
Binaries][tb] from 456 bytes to 114 bytes.
### Shrinking the Code
Below is the [original assembly code][asm-naive] (`asm-naive` in the
[results][tb]):
```nasm
;
; hi.s: unoptimized linux x86-64 assembly implementation.
;
bits 64
global _start
section .rodata
; "hi!\n"
hi db "hi!", 10
len equ $ - hi
section .text
_start:
mov rax, 1 ; write
mov rdi, 1 ; fd
mov rsi, hi ; msg
mov rdx, len ; len
syscall ; call write()
mov rax, 60 ; exit
mov rdi, 0 ; exit code
syscall ; call exit()
```
This produces a 456 byte binary with 39 bytes of code and 4 bytes of
data:
```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
456 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000027 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Disassembly of section .text:
0000000000400080 <.text>:
400080: b8 01 00 00 00 mov eax,0x1
400085: bf 01 00 00 00 mov edi,0x1
40008a: 48 be a8 00 40 00 00 movabs rsi,0x4000a8
400091: 00 00 00
400094: ba 04 00 00 00 mov edx,0x4
400099: 0f 05 syscall
40009b: b8 3c 00 00 00 mov eax,0x3c
4000a0: bf 00 00 00 00 mov edi,0x0
4000a5: 0f 05 syscall
```
First, we replace the unnecessary 5 byte instructions with smaller
equivalents:
```diff
diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
index 9d17cab..3694091 100644
--- a/src/asm-naive/hi.s
+++ b/src/asm-naive/hi.s
@@ -14,12 +14,12 @@ section .rodata
section .text
_start:
- mov rax, 1 ; write
- mov rdi, 1 ; fd
+ inc al ; write
+ inc edi; fd
mov rsi, hi ; msg
- mov rdx, len ; len
+ mov dl, len ; len
syscall ; call write()
- mov rax, 60 ; exit
- mov rdi, 0 ; exit code
+ mov al, 60 ; exit
+ xor edi, edi ; exit code
syscall ; call exit()
```
**Notes:**
* `inc al` works because [Linux][] zeros registers on process init.
* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`. The
other candidates (`inc dil`, `inc di`, `mov dil, al`, and
`mov di, ax`) are all 3 bytes.
* `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`,
`mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and
`xor rdi, rdi`) are all 3-5 bytes.
These changes shrink the binary size to 440 bytes, with 24 bytes of code
and 4 bytes of data:
```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
440 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000018 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000004 0000000000400098 0000000000400098 00000098 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Disassembly of section .text:
0000000000400080 <.text>:
400080: fe c0 inc al
400082: ff c7 inc edi
400084: 48 be 98 00 40 00 00 movabs rsi,0x400098
40008b: 00 00 00
40008e: b2 04 mov dl,0x4
400090: 0f 05 syscall
400092: b0 3c mov al,0x3c
400094: 31 ff xor edi,edi
400096: 0f 05 syscall
```
The code is now 24 bytes, of which 10 are one large `mov` instruction.
We can drop 2 bytes of code, 4 bytes of data, and the `.rodata` section
by doing the following:
1. Remove `mov rsi, str` (-10 bytes, good riddance).
2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section
overhead).
3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5
bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus
one byte for `push`.
4. Copy `rsp` to `rsi` (+3 bytes). This gives `write` a valid pointer.
Here's the result:
```nasm
bits 64
; "hi!\n", encoded as 32-bit little-endian int
str: equ 0x0a216968
section .text
global _start
_start:
push dword str ; push str (68 68 69 21 0a)
inc al ; write() (fe c0)
inc edi ; fd (ff c7)
mov rsi, rsp ; msg (48 89 e6)
mov dl, 4 ; len (b2 04)
syscall ; call write() (0f 05)
mov al, 60 ; exit() (b0 3c)
xor edi, edi ; exit code (31 ff)
syscall ; call exit() (0f 05)
```
This produces a 360 byte binary with 22 bytes of code and no data
section:
```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ ./hi
hi!
$ wc -c ./hi
360 ./hi
$ objdump -h ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000016 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
```
This is the smallest *legitimate* assembly implementation that I could
cook up. It's available in the [companion GitHub repository][repo] and
shown in [the results][tb] as `asm-opt`.
### Dirty Tricks
In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness
created a 114 byte static [x86-64][] [Linux][] binary by overlapping
portions of the [ELF header][] with the [program header][ph], then
embedding the code in unverified\* gaps of the [ELF header][].
(\* Unverified by [Linux][], that is. Junk in these fields causes
`readelf` and `objdump` give these binaries the stink eye, as we'll see
shortly).
Nathan also created a handy table showing which [ELF header][] bytes are
unverified by [Linux][]. In particular, there are two unverified 12
byte regions at offsets `4` and `40` which could store our 22 bytes of
code.
I reordered the code and divided it into into two chunks:
* `code_0`: First 10 bytes, plus a two byte jump to `code_1`
* `code_1`: Remaining 12 bytes.
Here's the [assembly][] for the two chunks:
```nasm
; entry point
; (10 bytes of code plus a 2 byte jump to code_1)
code_0:
push dword str ; push string onto stack (68 68 69 21 0a)
inc al ; write() (fe c0)
mov rsi, rsp ; str (48 89 e6)
jmp code_1 ; jump to next chunk (eb 18)
; ...
; second code chunk
; (12 bytes)
code_1:
inc edi ; fd (89 c7)
mov dl, 4 ; len (b2 04)
syscall ; call write() (0f 05)
mov al, 60 ; exit() (b0 3c)
xor edi, edi ; 0 exit code (31 ff)
syscall ; call exit() (0f 05)
```
These changes shrink the binary to 114 bytes. [Linux][] will still
happily execute the binary, but `readelf`, `objdump`, and `file` can't
make sense of it:
```bash
$ make
nasm -f bin -o hi hi.s
chmod a+x hi
$ ./hi
hi!
$ wc -c ./hi
114 ./hi
$ objdump -hd -Mintel ./hi
objdump: ./hi: file format not recognized
$ readelf -SW ./hi
There are 65329 section headers, starting at offset 0x3a:
readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header
readelf: Error: Reading 1014951344 bytes extends past end of file for section headers
readelf: Error: Too many program headers - 0x50f - the file is not that big
$ file ./hi
./hi: ELF, unknown class 104
$ hd ./hi
00000000 7f 45 4c 46 68 68 69 21 0a fe c0 48 89 e6 eb 18 |.ELFhhi!...H....|
00000010 02 00 3e 00 01 00 00 00 04 80 02 00 00 00 00 00 |..>.............|
00000020 3a 00 00 00 00 00 00 00 ff c7 b2 04 0f 05 b0 3c |:..............<|
00000030 31 ff 0f 05 40 00 38 00 01 00 01 00 00 00 05 00 |1...@.8.........|
00000040 00 00 00 00 00 00 00 00 00 00 00 80 02 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 |..........r.....|
00000060 00 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00 |..r.............|
00000070 00 00 |..|
00000072
```
It'll even run as a [Docker][] image:
```bash
$ cat Dockerfile
FROM scratch
COPY ./hi /hi
ENTRYPOINT ["/hi"]
$ docker build -t zoinks .
Sending build context to Docker daemon 8.704kB
Step 1/3 : FROM scratch
--->
Step 2/3 : COPY ./hi /hi
---> 336acd7e2d94
Step 3/3 : ENTRYPOINT ["/hi"]
---> Running in e20eca61de44
Removing intermediate container e20eca61de44
---> 297e5d7db5f8
Successfully built 297e5d7db5f8
Successfully tagged zoinks:latest
$ docker images zoinks --format '{{.Size}}'
114B
$ docker run --rm -it zoinks
hi!
$
```
This glorious monstrosity is included in the [companion
repository][repo] and shown in the [results][tb] as `asm-elf`.
### Links
If you enjoyed this post, you may also like:
* [Tiny Binaries Repository][repo]: Companion repository with
source code, build instructions, and additional details for this post.
* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][]
[Linux][] binaries, including a table of unverified [ELF header][]
bytes.
* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for
Linux][tiny-elf-orig]: Classic article on tiny 32-bit static binaries.
* [My Own Private Binary][]: Sequel to [A Whirlwind
Tutorial][tiny-elf-orig] where the author creates a 0 byte executable
using a kernel module.
**Update (2022-01-02):** Shorten, fix typos, improve grammar.
[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}}
"Tiny Binaries"
[assembly]: https://en.wikipedia.org/wiki/Assembly_language
"Assembly language"
[asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive
"Unoptimized assembly implementation."
[asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt
"Optimized assembly implementation."
[asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1
"Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped."
[repo]: https://github.com/pablotron/tiny-binaries
"Tiny Binaries GitHub repository"
[tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html
"Tiny ELF Files: Revisited in 2021"
[x86-64]: https://en.wikipedia.org/wiki/X86-64
"64-bit version of x86 instruction set"
[linux]: https://en.wikipedia.org/wiki/Linux
"Linux operating system"
[elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
"Executable and Linkable Format"
[ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header
"ELF Program Header"
[elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
"ELF file header"
[tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
"A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux"
[my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
"My Own Private Binary"
[docker]: https://en.wikipedia.org/wiki/Docker_(software)
"Docker"
[eof]: https://en.wikipedia.org/wiki/End-of-file
"End of File"
|