diff options
author | Paul Duncan <pabs@pablotron.org> | 2024-05-08 06:15:41 -0400 |
---|---|---|
committer | Paul Duncan <pabs@pablotron.org> | 2024-05-08 06:15:41 -0400 |
commit | 86770194e53447a0da5f8aef9c57e45feb7cc557 (patch) | |
tree | 8244ce4cc96049871cab5b1549d26cbb1a1d7dbd /examples/06-all | |
parent | 4d377bf007e346e086a0c6f925db3b6b7dfce731 (diff) | |
download | sha3-86770194e53447a0da5f8aef9c57e45feb7cc557.tar.bz2 sha3-86770194e53447a0da5f8aef9c57e45feb7cc557.zip |
sha3.c: neon: refactor, add documentation
- switch row_eor() from macro to static inline function
- compress rho rotate values into from 15 128-bit registers to two to
reduce register pressure (still spilling, though)
- remove PERMUTE macro
- switch from unrolled loop with macro in body of permute_n_neon() to
regular loop
- add documentation for register/lane layout and for compressed rho
rotations
with these changes the neon backend is still uses ~50% more cycles than
the scalar backend, so i will probably leave it disabled for the initial
release.
scalar (pi5):
> ./bench 2000
info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000
info: backend=scalar num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,20.2,10.3,10.3,9.3,9.2
sha3_256,32,20.2,10.3,10.3,9.9,9.7
sha3_384,48,20.9,15.3,12.8,12.7,12.5
sha3_512,64,20.2,20.2,18.9,25.3,17.9
shake128,32,20.2,10.1,9.0,8.1,7.9
shake256,32,20.2,10.3,10.3,9.9,9.7
neon backend bench results (pi5):
> ./bench 2000
info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000
info: backend=neon num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32
function,dst_len,64,256,1024,4096,16384
sha3_224,28,32.7,16.3,16.4,14.9,14.6
sha3_256,32,32.0,16.2,16.4,15.9,15.5
sha3_384,48,32.7,24.2,20.4,20.2,20.0
sha3_512,64,32.0,32.2,30.1,28.6,28.5
shake128,32,32.7,16.2,14.2,12.8,12.5
shake256,32,32.7,16.2,16.3,15.7,15.4
Diffstat (limited to 'examples/06-all')
0 files changed, 0 insertions, 0 deletions