diff options
author | Paul Duncan <pabs@pablotron.org> | 2024-05-06 21:56:10 -0400 |
---|---|---|
committer | Paul Duncan <pabs@pablotron.org> | 2024-05-06 21:56:10 -0400 |
commit | ae421618db3b68ccda95f54d1c9e8d05b2dab90a (patch) | |
tree | a2cfb400720c08d9a0ee524b55ab3aac943380cc /examples/04-turboshake128/Makefile | |
parent | af750de6399d9d1e1bc2d84a52faae8f84fa2364 (diff) | |
download | sha3-ae421618db3b68ccda95f54d1c9e8d05b2dab90a.tar.bz2 sha3-ae421618db3b68ccda95f54d1c9e8d05b2dab90a.zip |
sha3.c: neon backend now twice the speed of scalar backend (~50% fewer cyles, see commit message)
made the following changes:
- row_t contents are now 3 uint64x2_t instead of uin64x2x3_t (so they
are stored as registers instead of memory)
- fetch round constants 2 at a time
- round loop unrolled once
- drop convoluted ext/trn store (hard to read, doesn't help)
bench results
-------------
scalar backend:
> make clean all SHA3_BACKEND=1
...
> ./bench 10000
info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000
info: backend=scalar num_trials=10000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,20.2,10.3,10.3,9.3,9.2
sha3_256,32,20.2,10.3,10.3,9.9,9.7
sha3_384,48,20.9,15.3,12.8,12.7,12.7
sha3_512,64,20.2,20.2,18.9,17.9,18.1
shake128,32,20.2,10.3,9.0,8.1,7.9
shake256,32,20.2,10.1,10.3,9.9,9.7
neon backend:
> make clean all SHA3_BACKEND=3
...
> ./bench 10000
info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000
info: backend=neon num_trials=10000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,9.7,5.0,5.0,4.6,4.5
sha3_256,32,9.7,5.0,5.0,4.9,4.8
sha3_384,48,9.7,7.3,6.2,6.2,6.1
sha3_512,64,9.7,9.7,9.1,8.7,8.7
shake128,32,9.7,5.0,4.5,4.0,4.0
shake256,32,9.7,5.0,5.1,4.9,4.8
Diffstat (limited to 'examples/04-turboshake128/Makefile')
0 files changed, 0 insertions, 0 deletions