[PATCH 0/3] SRCU-protected uretprobes hot path
From: Andrii Nakryiko
Date: Mon Sep 09 2024 - 18:49:15 EST
Recently landed changes make uprobe entry hot code path makes use of RCU Tasks
Trace to avoid touching uprobe refcount, which at high frequency of uprobe
triggering leads to excessive cache line bouncing and limited scalability with
increased number of CPUs that simultaneously execute uprobe handlers.
This patch set adds return uprobe (uretprobe) side of this, this time
utilizing SRCU for the same reasons. Given the time between entry uprobe
activation (at which point uretprobe code hijacks user-space stack to get
activated on user function return) and uretprobe activation can be arbitrarily
long and is completely under control of user code, we need to protect
ourselves from too long or unbounded SRCU grace periods.
To that end we keep SRCU protection only for a limited time, and if user space
code takes longer to return, pending uretprobe instances are "downgraded" to
refcounted ones. This gives us best scalability and performance for
high-frequency uretprobes, and keeps upper bound on SRCU grace period duration
for low frequency uretprobes.
There are a bunch of synchronization issues between timer callback running in
IRQ handler and current thread executing uretprobe handlers, which is
abstracted away behind "hybrid lifetime uprobe" (hprobe) wrapper around uprobe
instance itself. See patch #1 for all the details.
rfc->v1:
- made put_uprobe() work in any context, not just user context (Oleg);
- changed to unconditional mod_timer() usage to avoid races (Oleg).
- I kept single-stepped uprobe changes, as they have a simple use of all the
hprobe functionality developed in patch #1.
Andrii Nakryiko (3):
uprobes: allow put_uprobe() from non-sleepable softirq context
uprobes: SRCU-protect uretprobe lifetime (with timeout)
uprobes: implement SRCU-protected lifetime for single-stepped uprobe
include/linux/uprobes.h | 53 +++++-
kernel/events/uprobes.c | 370 +++++++++++++++++++++++++++++++++-------
2 files changed, 353 insertions(+), 70 deletions(-)
--
2.43.5