Re: [PATCH v8 04/11] tracing/user_events: Fixup enable faults asyncly

From: Beau Belgrave
Date: Tue Mar 28 2023 - 17:43:06 EST


On Tue, Mar 28, 2023 at 05:20:49PM -0400, Steven Rostedt wrote:
> On Tue, 21 Feb 2023 13:11:36 -0800
> Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > @@ -263,7 +277,85 @@ static int user_event_mm_fault_in(struct user_event_mm *mm, unsigned long uaddr)
> > }
> >
> > static int user_event_enabler_write(struct user_event_mm *mm,
> > - struct user_event_enabler *enabler)
> > + struct user_event_enabler *enabler,
> > + bool fixup_fault);
> > +
> > +static void user_event_enabler_fault_fixup(struct work_struct *work)
> > +{
> > + struct user_event_enabler_fault *fault = container_of(
> > + work, struct user_event_enabler_fault, work);
> > + struct user_event_enabler *enabler = fault->enabler;
> > + struct user_event_mm *mm = fault->mm;
> > + unsigned long uaddr = enabler->addr;
> > + int ret;
> > +
> > + ret = user_event_mm_fault_in(mm, uaddr);
> > +
> > + if (ret && ret != -ENOENT) {
> > + struct user_event *user = enabler->event;
> > +
> > + pr_warn("user_events: Fault for mm: 0x%pK @ 0x%llx event: %s\n",
> > + mm->mm, (unsigned long long)uaddr, EVENT_NAME(user));
> > + }
> > +
> > + /* Prevent state changes from racing */
> > + mutex_lock(&event_mutex);
> > +
> > + /*
> > + * If we managed to get the page, re-issue the write. We do not
> > + * want to get into a possible infinite loop, which is why we only
> > + * attempt again directly if the page came in. If we couldn't get
> > + * the page here, then we will try again the next time the event is
> > + * enabled/disabled.
> > + */
>
> What case would we not get the page? A bad page mapping? User space doing
> something silly?
>

A user space program unmapping the page is the most common I can think
of. A silly action would be unmapping the page while forgetting to call
the unregister IOCTL. We would then possibly see this if the event was
enabled in perf/ftrace before the process exited (and the mm getting
cleaned up).

> Or something else, for which how can it go into an infinite loop? Can that
> only happen if userspace is doing something mischievous?
>

I'm not sure if changing page permissions on the user side would prevent
write permitted mapping in the kernel, but I wanted to ensure if that
type of thing did occur, we wouldn't loop forever. The code lets the mm
decide if a page is ever coming in via fixup_user_fault() with
FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE set.

My understanding is that fixup_user_fault() will retry to get the page
up until it's decided it's either capable of coming in or not. It will
do this since we pass the unlocked bool as a parameter. I used
fixup_user_fault() since it was created for the futex code to handle
this scenario better.

>From what I gather, the fault in should only fail for these reasons:
#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | \
VM_FAULT_SIGSEGV | VM_FAULT_HWPOISON | \
VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK)

If these are hit, I don't believe we want to retry as they aren't likely
to ever get corrected.

Thanks,
-Beau

> -- Steve
>
>
> > + clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > + if (!ret) {
> > + mmap_read_lock(mm->mm);
> > + user_event_enabler_write(mm, enabler, true);
> > + mmap_read_unlock(mm->mm);
> > + }
> > +
> > + mutex_unlock(&event_mutex);
> > +
> > + /* In all cases we no longer need the mm or fault */
> > + user_event_mm_put(mm);
> > + kmem_cache_free(fault_cache, fault);
> > +}
> > +
> > +static bool user_event_enabler_queue_fault(struct user_event_mm *mm,
> > + struct user_event_enabler *enabler)
> > +{
> > + struct user_event_enabler_fault *fault;
> > +
> > + fault = kmem_cache_zalloc(fault_cache, GFP_NOWAIT | __GFP_NOWARN);
> > +
> > + if (!fault)
> > + return false;
> > +
> > + INIT_WORK(&fault->work, user_event_enabler_fault_fixup);
> > + fault->mm = user_event_mm_get(mm);
> > + fault->enabler = enabler;
> > +
> > + /* Don't try to queue in again while we have a pending fault */
> > + set_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > + if (!schedule_work(&fault->work)) {
> > + /* Allow another attempt later */
> > + clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > + user_event_mm_put(mm);
> > + kmem_cache_free(fault_cache, fault);
> > +
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > +