Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault
From: Andy Lutomirski
Date: Sun Feb 24 2019 - 23:50:05 EST
On Sun, Feb 24, 2019 at 6:40 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> On Sun, 24 Feb 2019 09:26:45 -0800
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Sun, Feb 24, 2019 at 7:18 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > >
> > > On Sat, 23 Feb 2019 20:38:03 -0800
> > > Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> > > >
> > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep
> > > > as far as I know.
> > >
> > > Hmm, which might_sleep() would you pointed? What I talked was a
> > > WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just
> > > checks preempt_count.
> >
> > So the in_task() check does kind of make sense. Using "access_ok()"
> > outside of task context is certainly an odd thing, for several
> > reasons. The main one being simply that outside of task context, the
> > whole "which task" question is open, and you don't know if the task is
> > the active one, and so it's not clear if whatever task you interrupt
> > might have done "set_fs()" or not.
>
> Ah I got it. Usual case access_ok() in IRQ handler is strange.
>
> >
> > So PeterZ isn't wrong:
> >
> > > I guess PeterZ assumed that access_ok() is used only with user space access
> > > APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might
> > > sleep :)), but now we are trying to use access_ok() with new functions which
> > > disables page-fault and just return -EFAULT.
> >
> > .. but in this case, if we do it all *within* code that saves and
> > restores the user access flag with get_fs/set_fs, access_ok() would be
> > ok and it doesn't have the above issue.
> >
> > So access_ok() in _general_ is absolutely not safe to do from
> > interrupts, but within the context of probing user memory from a
> > tracing event it just happens to be ok.
>
> Hmm, but user can specify user-memory access from the tracing event
> which is located in interrupt handler. So I understand that it is safe
> only if we correctly setup access flag with get_fs/set_fs, is that
> correct?
>
> > It would be lovely to have a special macro for this, and keep the
> > warning for the general case, but because this is a "every
> > architecture needs to build their own" it's probably too painful.
>
> Agreed.
This should probably go with whatever effort makes nmi_uaccess_ok()
available on all architectures. That being said, how about just
making copy_from_user_nmi() work on all architectures, even if it just
fails unconditionally on some of them?