Re: [PATCH v2] perf: Synchronously cleanup child events

From: Alexei Starovoitov
Date: Tue Jan 26 2016 - 18:32:09 EST


On Tue, Jan 26, 2016 at 06:24:25PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 05:16:37PM +0100, Peter Zijlstra wrote:
> > > +struct file *perf_event_get(unsigned int fd)
> > > {
> > > + struct file *file;
> > >
> > > + file = fget_raw(fd);
> >
> > fget_raw() to guarantee the return value isn't NULL? afaict the O_PATH
> > stuff does not apply to perf events, so you'd put any fd for which the
> > distinction matters anyway.

yeah good catch. the following is needed:
file = fget_raw(fd);
+ if (!file)
+ return ERR_PTR(-EBADF);

> >
> > > + if (file->f_op != &perf_fops) {
> > > + fput(file);
> > > + return ERR_PTR(-EBADF);
> > > + }
> > >
> > > + return file;
> > > }
>
> It is not possible for one thread to concurrently call close() while
> this thread tries to fget() ? In which case, we must check the return
> value anyway?

the !file check is definitely needed, fd passed by the user can be bogus.
The caller of perf_event_get() checks for errors too.

This patch will conflict with kernel/bpf/arraymap.c and
kernel/trace/bpf_trace.c that are planned for net-next,
but the conflicts in kernel/events/core.c are probably harder
to resolve, so yes please take it into tip/perf.
I think your scm_right fixes depend on this patch and together
it's an important bug fix, so probably makes sense to send
them right now without waiting for the next merge window?
As soon as you get the whole thing into tip, I'll test it
to make sure bpf side is ok and I hope Wang will test it too.

I'm still a bit concerned about taking file reference for this,
since bpf prorgams that use perf_events won't be able to be
'detached'. Meaning there gotta be always a user space process
that will be holding perf_event FDs. On networking side we
don't have this limitation. Like we can attach bpf to TC,
iproute2 will exit and reattach some time later. So it
kinda sux, but sounds like you want to get rid of
perf_event->refcnt completely, so I don't see any other way.
We can fix it later if it really becomes an issue.