Re: [PATCH] x86/nmi: Fix some races in NMI uaccess

From: Jann Horn
Date: Mon Aug 27 2018 - 19:34:55 EST

On Tue, Aug 28, 2018 at 1:26 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Mon, Aug 27, 2018 at 4:12 PM, Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Tue, Aug 28, 2018 at 1:04 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >>
> >> In NMI context, we might be in the middle of context switching or in
> >> the middle of switch_mm_irqs_off(). In either case, CR3 might not
> >> match current->mm, which could cause copy_from_user_nmi() and
> >> friends to read the wrong memory.
> >>
> >> Fix it by adding a new nmi_uaccess_okay() helper and checking it in
> >> copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.
> >
> > What about eBPF probes (which I think can be attached to kprobe points
> > / tracepoints / perf events) that perform userspace reads / userspace
> > writes / kernel reads? Can those run in NMI context, and if so, do
> > they also need special handling?
> I assume they can run in NMI context, which might be problematic in
> and of themselves. For example, does BPF adequately protect against a
> BPF program accessing a map while bpf(2) is modifying it? It seems
> like bpf_prog_active is intended to serve this purpose.
> But I don't see any obvious mechanism for eBPF programs to read user memory.

Look in kernel/trace/bpf_trace.c, which defines a bunch of eBPF
helpers that can only be called from privileged eBPF code. Ah, but I
misremembered, the userspace write helper does have a guard against
interrupts, just the arbitrary read helper doesn't.

BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
int ret;

ret = probe_kernel_read(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);

return ret;
BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
u32, size)
* Ensure we're in user context which is safe for the helper to
* run. This helper has no business in a kthread.
* access_ok() should prevent writing to non-user memory, but in
* some situations (nommu, temporary switch, etc) access_ok() does
* not provide enough validation, hence the check on KERNEL_DS.

if (unlikely(in_interrupt() ||
current->flags & (PF_KTHREAD | PF_EXITING)))
return -EPERM;
if (unlikely(uaccess_kernel()))
return -EPERM;
if (!access_ok(VERIFY_WRITE, unsafe_ptr, size))
return -EPERM;

return probe_kernel_write(unsafe_ptr, src, size);