UPSTREAM: crypto: arm/chacha20 - faster 8-bit rotations and other optimizations
Optimize ChaCha20 NEON performance by: - Implementing the 8-bit rotations using the 'vtbl.8' instruction. - Streamlining the part that adds the original state and XORs the data. - Making some other small tweaks. On ARM Cortex-A7, these optimizations improve ChaCha20 performance from about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement. There is a tradeoff involved with the 'vtbl.8' rotation method since there is at least one CPU (Cortex-A53) where it's not fastest. But it seems to be a better default; see the added comment. Overall, this patch reduces Cortex-A53 performance by less than 0.5%. Signed-off-by:Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit a1b22a5f45fe884147a99e7c381bcc48d9b2acef) Bug: 112008522 Test: As series, see Ic61c13b53facfd2173065be715a7ee5f3af8760b Change-Id: Id7d26e6079cee50111f6d9616459547c60e6cb3e Signed-off-by:
Eric Biggers <ebiggers@google.com>
Loading
Please register or sign in to comment