Re: [RFC PATCH 05/17] perf: Introduce detached events
From: Peter Zijlstra
Date: Tue Oct 03 2017 - 10:34:52 EST
On Tue, Sep 05, 2017 at 04:30:14PM +0300, Alexander Shishkin wrote:
> There are usecases where it is desired to have perf events without the
> userspace tool running in the background to keep them alive, but instead
> only collect the data when it is needed, for example when an MCE event
> is triggered.
>
> This patch adds a new flag to the perf_event_open() syscall that allows
> creating such events. Once created, the file descriptor can be closed
> and the event continues to exist on its own. To allow access to this
> event, a file is created in the tracefs, which the user can open.
>
> Finally, when it is no longer needed, it can be destroyed by unlinking
> the file.
>
> @@ -9387,6 +9416,27 @@ static void account_event(struct perf_event *event)
> account_pmu_sb_event(event);
> }
>
> +static int perf_event_detach(struct perf_event *event, struct task_struct *task,
> + struct mm_struct *mm)
> +{
> + char *filename;
> +
> + filename = kasprintf(GFP_KERNEL, "%s:%x.event",
> + task ? "task" : "cpu",
> + hash_64((u64)event, PERF_TRACEFS_HASH_BITS));
> + if (!filename)
> + return -ENOMEM;
> +
> + event->dent = tracefs_create_file(filename, 0600,
> + perf_tracefs_dir,
> + event, &perf_fops);
> + kfree(filename);
> +
> + if (!event->dent)
> + return -ENOMEM;
> +
> + return 0;
> +}
So I'm not opposed to the idea of creating events that live independent
from of file descriptors. And stuffing them in a filesystem makes sense.
However I'm not entire convinced on the details.
The above has a number of problems:
- there's a filesystem race; two concurrent syscalls can try and create
the same file. In that case the error most certainly is not -ENOMEM.
- there's a hash collision, similar issue.
- there's some asymmetry in the create/destroy; that is you create the
file with sys_perf_event_open() and remove it with unlink().
- the actual name is very opaque and hard to use; how would a tool find
the right event to open?
Would it instead make sense to allow the user to creat() their own files
in this filesystem (with whatever descriptive name they need) and then
pass that fd like:
sys_perf_event_open(.group_fd=fd, .flags=PERF_FLAG_FD_DETACH);
or something to associate the file with the event. Of course, that makes
it very hard to create detached cgroup events :/