aboutsummaryrefslogtreecommitdiff
path: root/tests/bench/README.md
blob: c2c3b73ddda830a3dcd59a60e640188fad8dc670 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# bench

Benchmark [hash][] functions and [XOFs][xof], then print metadata to
standard error and print a table of [median][] [cycles per byte (cpb)][]
for each function and input message length to standard output in [CSV][]
format.

Requires [libcpucycles][].

The columns of the [CSV][] printed to standard output are as follows:

* `function`: Function name.
* `dst_len`: Output digest length, in bytes.
* `64`: [Median][] [cycles per byte (cpb)][] for a 64 byte input message.
* `256`: [Median][] [cycles per byte (cpb)][] for a 256 byte input message.
* `1024`: [Median][] [cycles per byte (cpb)][] for a 1024 byte input message.
* `4096`: [Median][] [cycles per byte (cpb)][] for a 4096 byte input message.
* `16384`: [Median][] [cycles per byte (cpb)][] for a 16384 byte input message.

The metadata printed to standard error is as follows:

* `version`: version of [libcpucycles][] as reported by `cpucycles_version()`
* `implementation`: [libcpucycles][] backend as reported by `cpucycles_implementation()`
* `persecond`: CPU cycles per second, as reported by `cpucycles_persecond()`
* `num_trials`: Number of trials.
* `src_lens`: Comma-delimited list of input messages lengths, in bytes.
* `dst_lens`: Comma-delimited list of output digest lengths, in bytes
  (only used for [XOFs][]).

## Build

1. Install [libcpucycles][].
2. Type `make`.  Creates an executable named `./bench` in the current
   directory.

## Run

Type `./bench` to run benchmarks with the default number of trials
(100,000), or `./bench NUM` to run benchmarks with a custom number of
trials.

**Note:** You may need to adjust your system configuration or run
`bench` as root to grant [libcpucycles][] access to the high-resolution
cycle counter.

See [the libcpucycles security page][libcpucycles-security] for details.

## Examples

Below are example runs of `bench` on a ThinkPad X1 Carbon ([x86-64][],
[AVX-512][] backend) and on an [Odroid N2L][] ([ARM64][], scalar
backend).

### Lenovo ThinkPad X1 Carbon, 6th Gen (i7-1185G7)

```
# enable user-level RDPMC access (run as root)
root> echo 2 > /proc/sys/kernel/perf_event_paranoid

# print cpu and compiler info
> lscpu | grep -i '^model name:' | sed 's/.*: *//'
11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> gcc --version | head -1
gcc (Debian 12.2.0-14) 12.2.0

# benchmark with 100k trials
> ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: num_trials=100000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,15.4,7.8,7.8,7.1,7.0
sha3_256,32,15.4,7.8,7.8,7.6,7.4
sha3_384,48,15.5,11.7,9.8,9.8,9.7
sha3_512,64,15.4,15.5,14.6,13.9,13.9
shake128,32,15.5,7.8,6.9,6.2,6.1
shake256,32,15.6,7.8,7.9,7.6,7.4
```

### Odroid N2L (Cortex-A73)

```
# enable user-level perf_even access (run as root)
root> echo 2 > /proc/sys/kernel/perf_event_paranoid

# print cpu and compiler info
> lscpu | grep -i '^model name' | sed 's/.*: *//'
Cortex-A73
> gcc --version | head -1
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

# benchmark with 100k trials
> ./bench
info: cpucycles: version=20240318 implementation=arm64-vct persecond=1800000000
info: num_trials=100000
TODO...
```

[csv]: https://en.wikipedia.org/wiki/Comma-separated_values
  "Comma-Separated Value (CSV)"
[libcpucycles]: https://cpucycles.cr.yp.to/
  "Microlibrary for counting CPU cycles."
[libcpucycles-security]: https://cpucycles.cr.yp.to/security.html
  "libcpucycles security documentation"
[median]: https://en.wikipedia.org/wiki/Median
  "Median"
[mean]: https://en.wikipedia.org/wiki/Arithmetic_mean
  "Arithmetic mean"
[stddev]: https://en.wikipedia.org/wiki/Standard_deviation
  "Standard deviation"
[odroid n2l]: https://en.odroid.se/products/odroid-n2l-4gb
  "Odroid N2L"
[x86-64]: https://en.wikipedia.org/wiki/X86-64
  "64-bit x86 instruction set."
[arm64]: https://en.wikipedia.org/wiki/AArch64
  "64-bit extension to the ARM instruction set."
[avx-512]: https://en.wikipedia.org/wiki/AVX-512
  "AVX-512: 512-bit extensions to the Advanced Vector Extensions (AVX) instruction set."
[cycles per byte]: https://en.wikipedia.org/wiki/Encryption_software#Performance
  "Observed CPU cycles divided by the number of input bytes."
[xof]: https://en.wikipedia.org/wiki/Extendable-output_function
  "Extendable-Output Function (XOF)"
[hash]: https://en.wikipedia.org/wiki/Cryptographic_hash_function
  "Cryptographic hash function"