Commit 112908b6 authored Nov 18, 2020 by Danny Lin Committed by Gagan Malvi May 01, 2021

arm64: lse: Prefetch operands to speed up atomic operations



On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon
855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE
causes LKDTM's ATOMIC_TIMING test to regress by 2x:

LL/SC ATOMIC_TIMING:    34.14s  34.08s
LSE ATOMIC_TIMING:      70.84s  71.06s

Prefetching the target operands fixes the regression and makes LSE
perform better than LSE as expected:

LSE+prfm ATOMIC_TIMING: 21.36s  21.21s

"dd if=/dev/zero of=/dev/null count=10000000" also runs faster:
    LL/SC:  3.3 3.2 3.3 s
    LSE:    3.1 3.2 3.2 s
    LSE+p:  2.3 2.3 2.3 s

Commit 0ea366f5 applied the same change
to LL/SC atomics, but it was never ported to LSE.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>

parent af776024

Hide whitespace changes

Inline Side-by-side

Please register or to comment