aboutsummaryrefslogtreecommitdiff
path: root/examples/01-shake128/sha3.c
diff options
context:
space:
mode:
authorPaul Duncan <pabs@pablotron.org>2024-05-06 21:56:10 -0400
committerPaul Duncan <pabs@pablotron.org>2024-05-06 21:56:10 -0400
commitae421618db3b68ccda95f54d1c9e8d05b2dab90a (patch)
treea2cfb400720c08d9a0ee524b55ab3aac943380cc /examples/01-shake128/sha3.c
parentaf750de6399d9d1e1bc2d84a52faae8f84fa2364 (diff)
downloadsha3-ae421618db3b68ccda95f54d1c9e8d05b2dab90a.tar.bz2
sha3-ae421618db3b68ccda95f54d1c9e8d05b2dab90a.zip
sha3.c: neon backend now twice the speed of scalar backend (~50% fewer cyles, see commit message)
made the following changes: - row_t contents are now 3 uint64x2_t instead of uin64x2x3_t (so they are stored as registers instead of memory) - fetch round constants 2 at a time - round loop unrolled once - drop convoluted ext/trn store (hard to read, doesn't help) bench results ------------- scalar backend: > make clean all SHA3_BACKEND=1 ... > ./bench 10000 info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000 info: backend=scalar num_trials=10000 src_lens=64,256,1024,4096,16384 dst_lens=32 function,dst_len,64,256,1024,4096,16384 sha3_224,28,20.2,10.3,10.3,9.3,9.2 sha3_256,32,20.2,10.3,10.3,9.9,9.7 sha3_384,48,20.9,15.3,12.8,12.7,12.7 sha3_512,64,20.2,20.2,18.9,17.9,18.1 shake128,32,20.2,10.3,9.0,8.1,7.9 shake256,32,20.2,10.1,10.3,9.9,9.7 neon backend: > make clean all SHA3_BACKEND=3 ... > ./bench 10000 info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000 info: backend=neon num_trials=10000 src_lens=64,256,1024,4096,16384 dst_lens=32 function,dst_len,64,256,1024,4096,16384 sha3_224,28,9.7,5.0,5.0,4.6,4.5 sha3_256,32,9.7,5.0,5.0,4.9,4.8 sha3_384,48,9.7,7.3,6.2,6.2,6.1 sha3_512,64,9.7,9.7,9.1,8.7,8.7 shake128,32,9.7,5.0,4.5,4.0,4.0 shake256,32,9.7,5.0,5.1,4.9,4.8
Diffstat (limited to 'examples/01-shake128/sha3.c')
0 files changed, 0 insertions, 0 deletions