FROMLIST: arm64: kernel: implement fast refcount checking
This adds support to arm64 for fast refcount checking, as contributed
by Kees for x86 based on the implementation by grsecurity/PaX.
The general approach is identical: the existing atomic_t helpers are
cloned for refcount_t, with the arithmetic instruction modified to set
the PSTATE flags, and one or two branch instructions added that jump to
an out of line handler if overflow, decrement to zero or increment from
zero are detected.
One complication that we have to deal with on arm64 is the fact that
it has two atomics implementations: the original LL/SC implementation
using load/store exclusive loops, and the newer LSE one that does mostly
the same in a single instruction. So we need to clone some parts of
both for the refcount handlers, but we also need to deal with the way
LSE builds fall back to LL/SC at runtime if the hardware does not
support it.
As is the case with the x86 version, the performance gain is substantial
(ThunderX2 @ 2.2 GHz, using LSE), even though the arm64 implementation
incorporates an add-from-zero check as well:
perf stat -B -- echo ATOMIC_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
116252672661 cycles # 2.207 GHz
52.689793525 seconds time elapsed
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
127060259162 cycles # 2.207 GHz
57.243690077 seconds time elapsed
For comparison, the numbers below were captured using CONFIG_REFCOUNT_FULL,
which uses the validation routines implemented in C using cmpxchg():
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
Performance counter stats for 'cat /dev/fd/63':
191057942484 cycles # 2.207 GHz
86.568269904 seconds time elapsed
As a bonus, this code has been found to perform significantly better on
systems with many CPUs, due to the fact that it no longer relies on the
load/compare-and-swap combo performed in a tight loop, which is what we
emit for cmpxchg() on arm64.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Cc: Jan Glauber <jglauber@cavium.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org>
[kdrag0n]
- Backported to k4.14 from:
https://www.spinics.net/lists/arm-kernel/msg735992.html
- Forward-ported to k4.19
- Benchmarked on sm8150 using perf and LKDTM REFCOUNT_TIMING:
https://docs.google.com/spreadsheets/d/14CctCmWzQAGhOmpHrBJfXQy_HuNFTpEkMEYSUGKOZR8/edit
| Fast checking | Generic checking
---------+--------------------+-----------------------
Cycles | 79235532616 | 102554062037
| 79391767237 | 99625955749
Time | 32.99879212 sec | 42.5354029 sec
| 32.97133254 sec | 41.31902045 sec
Average:
Cycles | 79313649927 | 101090008893
Time | 33 sec | 42 sec
Signed-off-by:
Danny Lin <danny@kdrag0n.dev>
Loading
Please register or sign in to comment