arm64: lse: Prefetch operands to speed up atomic operations
On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon 855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE causes LKDTM's ATOMIC_TIMING test to regress by 2x: LL/SC ATOMIC_TIMING: 34.14s 34.08s LSE ATOMIC_TIMING: 70.84s 71.06s Prefetching the target operands fixes the regression and makes LSE perform better than LSE as expected: LSE+prfm ATOMIC_TIMING: 21.36s 21.21s "dd if=/dev/zero of=/dev/null count=10000000" also runs faster: LL/SC: 3.3 3.2 3.3 s LSE: 3.1 3.2 3.2 s LSE+p: 2.3 2.3 2.3 s Commit 0ea366f5 applied the same change to LL/SC atomics, but it was never ported to LSE. Signed-off-by:Danny Lin <danny@kdrag0n.dev>
Loading
Please register or sign in to comment