Re: [PATCH 1/1] tracing, bpf: Implement function bpf_probe_write
From: Alexei Starovoitov
Date: Tue Jul 19 2016 - 02:14:05 EST
On Mon, Jul 18, 2016 at 03:57:17AM -0700, Sargun Dhillon wrote:
>
>
> On Sun, 17 Jul 2016, Alexei Starovoitov wrote:
>
> >On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote:
> >>
> >>+static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
> >>+{
> >>+ void *to = (void *) (long) r1;
> >>+ void *from = (void *) (long) r2;
> >>+ int size = (int) r3;
> >>+
> >>+ /* check if we're in a user context */
> >>+ if (unlikely(in_interrupt()))
> >>+ return -EINVAL;
> >>+ if (unlikely(!current->pid))
> >>+ return -EINVAL;
> >>+
> >>+ return copy_to_user(to, from, size);
> >>+}
> >
> >thanks for the patch, unfortunately it's not that straightforward.
> >copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and
> >you'll see the splat since bpf programs are protected by rcu.
> >Also 'current' can be null and I'm not sure what current->pid does.
> >So the writing to user memory either has to be verified to avoid
> >sleeping and faults or we need to use something like task_work_add
> >mechanism. Ideas are certainly welcome.
> >
> >
> From casual inspection, I can't find where current can be null when
> in_interrupt() is false. Although, we can check before dereferencing it.
> When not in a user context, the pid of the task struct returns 0.
>
> As far as preventing sleep, would the following alteration do? Or do we
> actually need something more sophisticated?
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index be89c148..45878f3 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -86,14 +86,19 @@ static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64
> r4, u64 r5)
> void *to = (void *) (long) r1;
> void *from = (void *) (long) r2;
> int size = (int) r3;
> + struct task_struct *task = current;
>
> /* check if we're in a user context */
> if (unlikely(in_interrupt()))
> return -EINVAL;
> - if (unlikely(!current->pid))
> + if (unlikely(!task || !task->pid))
> return -EINVAL;
>
> - return copy_to_user(to, from, size);
> + /* Is this a user address, or a kernel address? */
> + if (!access_ok(VERIFY_WRITE, to, size))
> + return -EINVAL;
> +
> + return probe_kernel_write(to, from, size);
> }
I think it can actually work. The only concern is that comment
in access_ok() says that it may sleep whereas I couldn't find
any arch where that would be the case.
Could you please send an official patch with detailed commit log?