arm64: lse: Prefetch operands to speed up atomic operations
On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon
855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE
causes LKDTM's ATOMIC_TIMING test to regress by 2x:
LL/SC ATOMIC_TIMING: 34.14s 34.08s
LSE ATOMIC_TIMING: 70.84s 71.06s
Prefetching the target operands fixes the regression and makes LSE
perform better than LSE as expected:
LSE+prfm ATOMIC_TIMING: 21.36s 21.21s
"dd if=/dev/zero of=/dev/null count=10000000" also runs faster:
LL/SC: 3.3 3.2 3.3 s
LSE: 3.1 3.2 3.2 s
LSE+p: 2.3 2.3 2.3 s
Commit 0ea366f5 applied the same change
to LL/SC atomics, but it was never ported to LSE.
Signed-off-by:
Danny Lin <danny@kdrag0n.dev>
Loading
Please register or sign in to comment