Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 4e34e51f authored by Eric Biggers's avatar Eric Biggers Committed by Herbert Xu
Browse files

crypto: arm/chacha20 - always use vrev for 16-bit rotates



The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower.  Switch the one-way code to vrev32.16 too.

Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
parent f53ad3e1
Loading
Loading
Loading
Loading
+4 −6
Original line number Diff line number Diff line
@@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
.Ldoubleround:
	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
	vadd.i32	q0, q0, q1
	veor		q4, q3, q0
	vshl.u32	q3, q4, #16
	vsri.u32	q3, q4, #16
	veor		q3, q3, q0
	vrev32.16	q3, q3

	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
	vadd.i32	q2, q2, q3
@@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)

	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
	vadd.i32	q0, q0, q1
	veor		q4, q3, q0
	vshl.u32	q3, q4, #16
	vsri.u32	q3, q4, #16
	veor		q3, q3, q0
	vrev32.16	q3, q3

	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
	vadd.i32	q2, q2, q3