diff options
author | Paul Duncan <pabs@pablotron.org> | 2024-05-27 03:56:29 -0400 |
---|---|---|
committer | Paul Duncan <pabs@pablotron.org> | 2024-05-27 03:56:29 -0400 |
commit | 4c3394528c540de31ee2785344735ec9f46c7559 (patch) | |
tree | f7b403d00b50c36e87ddf68b91f87f99ccc54fd3 /examples/02-kmac128 | |
parent | 7c278410aabda783d065a9e2b2b4956a1b5bb501 (diff) | |
download | sha3-4c3394528c540de31ee2785344735ec9f46c7559.tar.bz2 sha3-4c3394528c540de31ee2785344735ec9f46c7559.zip |
sha3.c: permute_n_avx2(): replace some permutes with blends, minor cleanups
with these changes:
- clang: avx2 comparable to scalar
- gcc: avx2 still slower than scalar
bench results
-------------
gcc scalar:
> make clean all BACKEND=1 CC=gcc && ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: backend=scalar num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,19.5,10.0,9.9,9.0,8.8
sha3_256,32,19.5,10.0,9.9,9.5,9.3
sha3_384,48,19.5,14.7,12.3,12.2,12.0
sha3_512,64,19.5,19.6,18.2,17.1,17.1
shake128,32,19.6,9.9,8.7,7.8,7.6
shake256,32,19.6,9.9,10.0,9.5,9.3
gcc avx2:
> make clean all BACKEND=6 CC=gcc && ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: backend=avx2 num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,24.5,12.3,12.2,11.1,10.9
sha3_256,32,24.4,12.2,12.2,11.9,11.6
sha3_384,48,24.2,18.3,15.3,15.2,15.0
sha3_512,64,24.5,24.4,22.8,21.6,21.6
shake128,32,24.6,12.1,10.8,9.6,9.4
shake256,32,24.7,12.2,12.2,11.8,11.6
clang scalar:
> make clean all BACKEND=1 CC=clang && ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: backend=scalar num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,21.8,9.9,9.7,8.8,8.7
sha3_256,32,21.1,9.9,9.8,9.4,9.2
sha3_384,48,21.1,14.6,12.1,12.0,11.8
sha3_512,64,21.2,19.2,17.9,16.9,16.9
shake128,32,21.0,9.9,8.6,7.7,7.5
shake256,32,20.9,9.9,9.8,9.5,9.2
clang avx2:
> make clean all BACKEND=6 CC=clang && ./bench
info: cpucycles: version=20240318 implementation=amd64-pmc persecond=4800000000
info: backend=avx2 num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,19.9,10.0,9.9,9.0,8.9
sha3_256,32,19.9,10.0,9.9,9.6,9.4
sha3_384,48,20.1,14.9,12.4,12.3,12.2
sha3_512,64,19.9,19.6,18.4,17.4,17.4
shake128,32,19.9,10.0,8.8,7.9,7.7
shake256,32,20.0,10.0,9.9,9.6,9.4
Diffstat (limited to 'examples/02-kmac128')
0 files changed, 0 insertions, 0 deletions