1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
|
# bench
Benchmark [hash][] functions and [XOFs][xof], then print metadata to
standard error and print a table of [median][] [cycles per byte (cpb)][]
for each function and input message length to standard output in [CSV][]
format.
Requires [libcpucycles][].
The columns of the [CSV][] printed to standard output are as follows:
* `function`: Function name.
* `dst_len`: Output digest length, in bytes.
* `64`: [Median][] [cycles per byte (cpb)][] for a 64 byte input message.
* `256`: [Median][] [cycles per byte (cpb)][] for a 256 byte input message.
* `1024`: [Median][] [cycles per byte (cpb)][] for a 1024 byte input message.
* `4096`: [Median][] [cycles per byte (cpb)][] for a 4096 byte input message.
* `16384`: [Median][] [cycles per byte (cpb)][] for a 16384 byte input message.
The metadata printed to standard error is as follows:
* `version`: version of [libcpucycles][] as reported by `cpucycles_version()`
* `implementation`: [libcpucycles][] backend as reported by `cpucycles_implementation()`
* `persecond`: CPU cycles per second, as reported by `cpucycles_persecond()`
* `num_trials`: Number of trials.
* `src_lens`: Comma-delimited list of input messages lengths, in bytes.
* `dst_lens`: Comma-delimited list of output digest lengths, in bytes
(only used for [XOFs][]).
## Build
1. Install [libcpucycles][].
2. Type `make`. Creates an executable named `./bench` in the current
directory.
## Run
Type `./bench` to run benchmarks with the default number of trials
(100,000), or `./bench NUM` to run benchmarks with a custom number of
trials.
**Note:** You may need to adjust your system configuration or run
`bench` as root to grant [libcpucycles][] access to the high-resolution
cycle counter.
See [the libcpucycles security page][libcpucycles-security] for details.
## Examples
Below are example runs of `bench` on a ThinkPad X1 Carbon ([x86-64][],
[AVX-512][] backend) and on an [Odroid N2L][] ([ARM64][], scalar
backend).
### Lenovo ThinkPad X1 Carbon, 6th Gen (i7-1185G7)
```
# enable user-level RDPMC access (run as root)
root> echo 2 > /proc/sys/kernel/perf_event_paranoid
# print cpu and compiler info
> lscpu | grep -i '^model name:' | sed 's/.*: *//'
11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> gcc --version | head -1
gcc (Debian 12.2.0-14) 12.2.0
# benchmark with 100k trials
> ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: num_trials=100000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,15.4,7.8,7.8,7.1,7.0
sha3_256,32,15.4,7.8,7.8,7.6,7.4
sha3_384,48,15.5,11.7,9.8,9.8,9.7
sha3_512,64,15.4,15.5,14.6,13.9,13.9
shake128,32,15.5,7.8,6.9,6.2,6.1
shake256,32,15.6,7.8,7.9,7.6,7.4
```
### Odroid N2L (Cortex-A73)
```
# enable user-level perf_even access (run as root)
root> echo 2 > /proc/sys/kernel/perf_event_paranoid
# print cpu and compiler info
> lscpu | grep -i '^model name' | sed 's/.*: *//'
Cortex-A73
> gcc --version | head -1
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
# benchmark with 100k trials
> ./bench
info: cpucycles: version=20240318 implementation=arm64-vct persecond=1800000000
info: num_trials=100000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,32.8,15.8,15.2,13.7,13.5
sha3_256,32,32.8,15.8,15.1,14.5,14.1
sha3_384,48,32.8,22.9,18.7,18.5,18.2
sha3_512,64,32.8,30.2,27.5,26.0,25.9
shake128,32,32.8,15.8,13.4,11.9,11.6
shake256,32,32.8,15.8,15.1,14.5,14.1
```
[csv]: https://en.wikipedia.org/wiki/Comma-separated_values
"Comma-Separated Value (CSV)"
[libcpucycles]: https://cpucycles.cr.yp.to/
"Microlibrary for counting CPU cycles."
[libcpucycles-security]: https://cpucycles.cr.yp.to/security.html
"libcpucycles security documentation"
[median]: https://en.wikipedia.org/wiki/Median
"Median"
[mean]: https://en.wikipedia.org/wiki/Arithmetic_mean
"Arithmetic mean"
[stddev]: https://en.wikipedia.org/wiki/Standard_deviation
"Standard deviation"
[odroid n2l]: https://en.odroid.se/products/odroid-n2l-4gb
"Odroid N2L"
[x86-64]: https://en.wikipedia.org/wiki/X86-64
"64-bit x86 instruction set."
[arm64]: https://en.wikipedia.org/wiki/AArch64
"64-bit extension to the ARM instruction set."
[avx-512]: https://en.wikipedia.org/wiki/AVX-512
"AVX-512: 512-bit extensions to the Advanced Vector Extensions (AVX) instruction set."
[cycles per byte]: https://en.wikipedia.org/wiki/Encryption_software#Performance
"Observed CPU cycles divided by the number of input bytes."
[xof]: https://en.wikipedia.org/wiki/Extendable-output_function
"Extendable-Output Function (XOF)"
[hash]: https://en.wikipedia.org/wiki/Cryptographic_hash_function
"Cryptographic hash function"
|