aboutsummaryrefslogtreecommitdiff
path: root/tests/neon/Makefile
diff options
context:
space:
mode:
authorPaul Duncan <pabs@pablotron.org>2024-05-08 06:15:41 -0400
committerPaul Duncan <pabs@pablotron.org>2024-05-08 06:15:41 -0400
commit86770194e53447a0da5f8aef9c57e45feb7cc557 (patch)
tree8244ce4cc96049871cab5b1549d26cbb1a1d7dbd /tests/neon/Makefile
parent4d377bf007e346e086a0c6f925db3b6b7dfce731 (diff)
downloadsha3-86770194e53447a0da5f8aef9c57e45feb7cc557.tar.bz2
sha3-86770194e53447a0da5f8aef9c57e45feb7cc557.zip
sha3.c: neon: refactor, add documentation
- switch row_eor() from macro to static inline function - compress rho rotate values into from 15 128-bit registers to two to reduce register pressure (still spilling, though) - remove PERMUTE macro - switch from unrolled loop with macro in body of permute_n_neon() to regular loop - add documentation for register/lane layout and for compressed rho rotations with these changes the neon backend is still uses ~50% more cycles than the scalar backend, so i will probably leave it disabled for the initial release. scalar (pi5): > ./bench 2000 info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000 info: backend=scalar num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32 function,dst_len,64,256,1024,4096,16384 sha3_224,28,20.2,10.3,10.3,9.3,9.2 sha3_256,32,20.2,10.3,10.3,9.9,9.7 sha3_384,48,20.9,15.3,12.8,12.7,12.5 sha3_512,64,20.2,20.2,18.9,25.3,17.9 shake128,32,20.2,10.1,9.0,8.1,7.9 shake256,32,20.2,10.3,10.3,9.9,9.7 neon backend bench results (pi5): > ./bench 2000 info: cpucycles: version=20240318 implementation=arm64-vct persecond=2400000000 info: backend=neon num_trials=2000 src_lens=64,256,1024,4096,16384 dst_lens=32 function,dst_len,64,256,1024,4096,16384 sha3_224,28,32.7,16.3,16.4,14.9,14.6 sha3_256,32,32.0,16.2,16.4,15.9,15.5 sha3_384,48,32.7,24.2,20.4,20.2,20.0 sha3_512,64,32.0,32.2,30.1,28.6,28.5 shake128,32,32.7,16.2,14.2,12.8,12.5 shake256,32,32.7,16.2,16.3,15.7,15.4
Diffstat (limited to 'tests/neon/Makefile')
0 files changed, 0 insertions, 0 deletions