Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
From: Joel Fernandes
Date: Tue Feb 26 2019 - 16:39:28 EST
On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote:
> Here is an RFC series of probe-event to support user-space access
> methods, which we discussed in previous thread.
> So in this thread, it is clear that current probe_kernel_read()
> and strncpy_from_unsafe() are not enough to access user-space
> variables from kprobe events on some arch. On such arch, user
> address space and kernel address space can overlap so we have
> to change the memory segment to user-mode before copying.
> But probe_kernel_read() is designed to access primarily kernel
> memory, it may fail to get, or get unexpected value on such
> arch. So we need to expand kprobe fetcharg to support new options
> for such case.
> For user-space access extension, this series adds 2 features,
> "ustring" type and user-space dereference syntax. "ustring" is
> used for recording a null-terminated string in user-space from
> kprobe events.
> "ustring" type is easy, it is able to use instead of "string"
> type, so if you want to record a user-space string via
> "__user char *", you can use ustring type instead of string.
> For example,
> echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events
> will record the path string from user-space.
> The user-space dereference syntax is also simple. Thi just
> adds 'u' prefix before an offset value.
> e.g. +u8(%ax), +u0(+0(%si))
> This is more generic. If you want to refer the variable in user-
> space from its address or access a field in data structure in
> user-space, you need to use this.
> For example, if you probe do_sched_setscheduler(pid, policy,
> param) and record param->sched_priority, you can add new
> probe as below;
> p do_sched_setscheduler priority=+u0($arg3)
> Actually, with this feature, "ustring" type is not absolutely
> necessary, because these are same meanings.
> +0($arg2):ustring == +u0($arg2):string
> Perhups, we may be better removing "ustring" and just introducing
> this user-space dereference syntax...
> Note that kprobe event provides these methods, but it doesn't
> change it from kernel to user automatically because we do not
> know whether the given address is in userspace or kernel on
> some arch.
> Moreover, from perf-probe, at this moment it is not able to
> switch. Since __user is not for compiler but checker, we have
> no clue which data structure is in user-space, in debuginfo.
> BTW, according to Linus's comment, I implemented probe_user_read()
> and strncpy_from_unsafe_user() APIs. And since those use
> "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
> it will get a warn message at once. It should be solved before
> merging this series.
I was wondering why access_ok() can sleep. In the arm64 and x86
implementation, I don't see access_ok() itself causing a user pointer
dereference access that can cause a page fault. It seems to just be checking
the validity of the ranges.
Any idea why the access_ok() code has these comments?
"Context: User context only. This function may sleep if pagefaults are
My _guess_ is this is because whatever calls access_ok() may also call
something else that *does* fault next, if that's the case then that
WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments
should be more clear that it is not access_ok() itself that sleeps.
thanks for any help on understanding this,
> Thank you,
> Masami Hiramatsu (4):
> uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
> uaccess: Add non-pagefault user-space read functions
> tracing/probe: Add ustring type for user-space string
> tracing/probe: Support user-space dereference
> Documentation/trace/kprobetrace.rst | 13 ++-
> Documentation/trace/uprobetracer.rst | 9 +-
> fs/namespace.c | 2
> include/linux/uaccess.h | 13 +++
> kernel/trace/trace.c | 7 +-
> kernel/trace/trace_kprobe.c | 65 ++++++++++++++++
> kernel/trace/trace_probe.c | 39 ++++++++--
> kernel/trace/trace_probe.h | 3 +
> kernel/trace/trace_probe_tmpl.h | 36 +++++++--
> kernel/trace/trace_uprobe.c | 19 +++++
> mm/maccess.c | 138 ++++++++++++++++++++++++++++++----
> 11 files changed, 302 insertions(+), 42 deletions(-)
> Masami Hiramatsu (Linaro) <mhiramat@xxxxxxxxxx>