crypto: arm/chacha20 - always use vrev for 16-bit rotates (4e34e51f) · Commits · e / devices / android_kernel_teracube_emerald

The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16, but the one-way code (used on remainder blocks) implements it with vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too. Signed-off-by:

Eric Biggers <ebiggers@google.com> Acked-by:

Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:

Herbert Xu <herbert@gondor.apana.org.au>

arch/arm/crypto/chacha20-neon-core.S

+4 −6

Original line number	Diff line number	Diff line
		@@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
		.Ldoubleround:
		// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
		vadd.i32 q0, q0, q1
		veor q4, q3, q0
		vshl.u32 q3, q4, #16
		vsri.u32 q3, q4, #16
		veor q3, q3, q0
		vrev32.16 q3, q3

		// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
		vadd.i32 q2, q2, q3
		@@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)

		// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
		vadd.i32 q0, q0, q1
		veor q4, q3, q0
		vshl.u32 q3, q4, #16
		vsri.u32 q3, q4, #16
		veor q3, q3, q0
		vrev32.16 q3, q3

		// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
		vadd.i32 q2, q2, q3