---
slug: tiny-binaries-assembly-optimization
title: "Tiny Binaries: Assembly Optimization"
date: "2022-01-01T08:22:07-04:00"
---

Here's how I reduced the [assembly][asm-naive] binary size in [Tiny
Binaries][tb] from 456 bytes to 114 bytes.

### Shrinking the Code

Below is the [original assembly code][asm-naive] (`asm-naive` in the
[results][tb]):

```nasm
;
; hi.s: unoptimized linux x86-64 assembly implementation.
;

bits 64

global _start

section .rodata
  ; "hi!\n"
  hi  db  "hi!", 10
  len equ $ - hi

section .text

_start:
  mov rax, 1 ; write
  mov rdi, 1 ; fd
  mov rsi, hi ; msg
  mov rdx, len ; len
  syscall ; call write()

  mov rax, 60 ; exit
  mov rdi, 0 ; exit code
  syscall ; call exit()
```
 

This produces a 456 byte binary with 39 bytes of code and 4 bytes of
data:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
456 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000027  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000004  00000000004000a8  00000000004000a8  000000a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Disassembly of section .text:

0000000000400080 <.text>:
  400080:	b8 01 00 00 00       	mov    eax,0x1
  400085:	bf 01 00 00 00       	mov    edi,0x1
  40008a:	48 be a8 00 40 00 00 	movabs rsi,0x4000a8
  400091:	00 00 00
  400094:	ba 04 00 00 00       	mov    edx,0x4
  400099:	0f 05                	syscall
  40009b:	b8 3c 00 00 00       	mov    eax,0x3c
  4000a0:	bf 00 00 00 00       	mov    edi,0x0
  4000a5:	0f 05                	syscall
```
&nbsp;

First, we replace the unnecessary 5 byte instructions with smaller
equivalents:

```diff
diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
index 9d17cab..3694091 100644
--- a/src/asm-naive/hi.s
+++ b/src/asm-naive/hi.s
@@ -14,12 +14,12 @@ section .rodata
 section .text

 _start:
-  mov rax, 1 ; write
-  mov rdi, 1 ; fd
+  inc al ; write
+  inc edi; fd
   mov rsi, hi ; msg
-  mov rdx, len ; len
+  mov dl, len ; len
   syscall ; call write()

-  mov rax, 60 ; exit
-  mov rdi, 0 ; exit code
+  mov al, 60 ; exit
+  xor edi, edi ; exit code
   syscall ; call exit()
```
&nbsp;

**Notes:**
* `inc al` works because [Linux][] zeros registers on process init.
* `inc edi` is 2 bytes. Another 2 byte option is `mov edi, eax`.  The
  other candidates (`inc dil`, `inc di`, `mov dil, al`, and
  `mov di, ax`) are all 3 bytes.
* `xor edi, edi` is 2 bytes. The other candidates (`mov dil, 0`,
  `mov di, 0`, `mov edi, 0`, `xor dil, dil`, `xor di, di`, and
  `xor rdi, rdi`) are all 3-5 bytes.

These changes shrink the binary size to 440 bytes, with 24 bytes of code
and 4 bytes of data:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
440 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000018  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000004  0000000000400098  0000000000400098  00000098  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Disassembly of section .text:

0000000000400080 <.text>:
  400080:	fe c0                	inc    al
  400082:	ff c7                	inc    edi
  400084:	48 be 98 00 40 00 00 	movabs rsi,0x400098
  40008b:	00 00 00
  40008e:	b2 04                	mov    dl,0x4
  400090:	0f 05                	syscall
  400092:	b0 3c                	mov    al,0x3c
  400094:	31 ff                	xor    edi,edi
  400096:	0f 05                	syscall
```
&nbsp;

The code is now 24 bytes, of which 10 are one large `mov` instruction.

We can drop 2 bytes of code, 4 bytes of data, and the `.rodata` section
by doing the following:

1. Remove `mov rsi, str` (-10 bytes, good riddance).
2. Drop the `.rodata` section (-4 bytes of data plus `.rodata` section
   overhead).
3. Encode `"hi!\n"` as a 32-bit integer and push it to the stack (+5
   bytes). *Hint:* `"hi!\n" = 68 69 21 0a`, encoded as `0x0a216968` plus
   one byte for `push`.
4. Copy `rsp` to `rsi` (+3 bytes).  This gives `write` a valid pointer.

Here's the result:

```nasm
bits 64

; "hi!\n", encoded as 32-bit little-endian int
str: equ 0x0a216968

section .text
global _start
_start:
  push dword str  ; push str (68 68 69 21 0a)

  inc al          ; write() (fe c0)
  inc edi         ; fd (ff c7)
  mov rsi, rsp    ; msg (48 89 e6)
  mov dl, 4       ; len (b2 04)
  syscall         ; call write() (0f 05)

  mov al, 60      ; exit() (b0 3c)
  xor edi, edi    ; exit code (31 ff)
  syscall         ; call exit() (0f 05)
```
&nbsp;

This produces a 360 byte binary with 22 bytes of code and no data
section:

```bash
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ ./hi
hi!
$ wc -c ./hi
360 ./hi
$ objdump -h ./hi
...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000016  0000000000400080  0000000000400080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
```
&nbsp;

This is the smallest *legitimate* assembly implementation that I could
cook up. It's available in the [companion GitHub repository][repo] and
shown in [the results][tb] as `asm-opt`.

### Dirty Tricks

In [Tiny ELF Files: Revisited in 2021][tiny-elf], Nathan Otterness
created a 114 byte static [x86-64][] [Linux][] binary by overlapping
portions of the [ELF header][] with the [program header][ph], then
embedding the code in unverified\* gaps of the [ELF header][].

(\* Unverified by [Linux][], that is.  Junk in these fields causes
`readelf` and `objdump` give these binaries the stink eye, as we'll see
shortly).

Nathan also created a handy table showing which [ELF header][] bytes are
unverified by [Linux][].  In particular, there are two unverified 12
byte regions at offsets `4` and `40` which could store our 22 bytes of
code.

I reordered the code and divided it into into two chunks:

* `code_0`: First 10 bytes, plus a two byte jump to `code_1`
* `code_1`: Remaining 12 bytes.

Here's the [assembly][] for the two chunks:

```nasm
; entry point
; (10 bytes of code plus a 2 byte jump to code_1)
code_0:
  push dword str  ; push string onto stack (68 68 69 21 0a)
  inc al          ; write() (fe c0)
  mov rsi, rsp    ; str (48 89 e6)
  jmp code_1      ; jump to next chunk (eb 18)

; ...

; second code chunk
; (12 bytes)
code_1:
  inc edi         ; fd (89 c7)
  mov dl, 4       ; len (b2 04)
  syscall         ; call write() (0f 05)

  mov al, 60      ; exit() (b0 3c)
  xor edi, edi    ; 0 exit code (31 ff)
  syscall         ; call exit() (0f 05)
```
&nbsp;

These changes shrink the binary to 114 bytes.  [Linux][] will still
happily execute the binary, but `readelf`, `objdump`, and `file` can't
make sense of it:

```bash
$ make
nasm -f bin -o hi hi.s
chmod a+x hi
$ ./hi
hi!
$ wc -c ./hi
114 ./hi
$ objdump -hd -Mintel ./hi
objdump: ./hi: file format not recognized
$ readelf -SW ./hi
There are 65329 section headers, starting at offset 0x3a:
readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header
readelf: Error: Reading 1014951344 bytes extends past end of file for section headers
readelf: Error: Too many program headers - 0x50f - the file is not that big
$ file ./hi
./hi: ELF, unknown class 104
$ hd ./hi
00000000  7f 45 4c 46 68 68 69 21  0a fe c0 48 89 e6 eb 18  |.ELFhhi!...H....|
00000010  02 00 3e 00 01 00 00 00  04 80 02 00 00 00 00 00  |..>.............|
00000020  3a 00 00 00 00 00 00 00  ff c7 b2 04 0f 05 b0 3c  |:..............<|
00000030  31 ff 0f 05 40 00 38 00  01 00 01 00 00 00 05 00  |1...@.8.........|
00000040  00 00 00 00 00 00 00 00  00 00 00 80 02 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 72 00 00 00 00 00  |..........r.....|
00000060  00 00 72 00 00 00 00 00  00 00 00 00 00 00 00 00  |..r.............|
00000070  00 00                                             |..|
00000072
```
&nbsp;

It'll even run as a [Docker][] image:

```bash
$ cat Dockerfile
FROM scratch
COPY ./hi /hi
ENTRYPOINT ["/hi"]
$ docker build -t zoinks .
Sending build context to Docker daemon  8.704kB
Step 1/3 : FROM scratch
 --->
Step 2/3 : COPY ./hi /hi
 ---> 336acd7e2d94
Step 3/3 : ENTRYPOINT ["/hi"]
 ---> Running in e20eca61de44
Removing intermediate container e20eca61de44
 ---> 297e5d7db5f8
Successfully built 297e5d7db5f8
Successfully tagged zoinks:latest
$ docker images zoinks --format '{{.Size}}'
114B
$ docker run --rm -it zoinks
hi!
$
```
&nbsp;

This glorious monstrosity is included in the [companion
repository][repo] and shown in the [results][tb] as `asm-elf`.

### Links

If you enjoyed this post, you may also like:

* [Tiny Binaries Repository][repo]: Companion repository with
  source code, build instructions, and additional details for this post.
* [Tiny ELF Files: Revisited in 2021][tiny-elf]: Tiny static [x86-64][]
  [Linux][] binaries, including a table of unverified [ELF header][]
  bytes.
* [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for
  Linux][tiny-elf-orig]: Classic article on tiny 32-bit static binaries.
* [My Own Private Binary][]: Sequel to [A Whirlwind
  Tutorial][tiny-elf-orig] where the author creates a 0 byte executable
  using a kernel module.

**Update (2022-01-02):** Shorten, fix typos, improve grammar.

[tb]: {{< ref "/posts/2021-12-31-tiny-binaries.md" >}}
  "Tiny Binaries"
[assembly]: https://en.wikipedia.org/wiki/Assembly_language
  "Assembly language"
[asm-naive]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-naive
  "Unoptimized assembly implementation."
[asm-opt]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-opt
  "Optimized assembly implementation."
[asm-elf]: https://github.com/pablotron/tiny-binaries/tree/main/src/asm-elf-1
  "Optimized assembly implementation, embedded in unverified portions of ELF header, with ELF header and program header overlapped."
[repo]: https://github.com/pablotron/tiny-binaries
  "Tiny Binaries GitHub repository"
[tiny-elf]: https://nathanotterness.com/2021/10/tiny_elf_modernized.html
  "Tiny ELF Files: Revisited in 2021"
[x86-64]: https://en.wikipedia.org/wiki/X86-64
  "64-bit version of x86 instruction set"
[linux]: https://en.wikipedia.org/wiki/Linux
  "Linux operating system"
[elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
  "Executable and Linkable Format"
[ph]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header
  "ELF Program Header"
[elf header]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
  "ELF file header"
[tiny-elf-orig]: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
  "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux"
[my own private binary]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
  "My Own Private Binary"
[docker]: https://en.wikipedia.org/wiki/Docker_(software)
  "Docker"
[eof]: https://en.wikipedia.org/wiki/End-of-file
  "End of File"