Re: [PATCH] user_events: Enable user processes to create and write to trace events

From: Beau Belgrave
Date: Mon Oct 11 2021 - 12:25:33 EST


On Fri, Oct 08, 2021 at 06:22:58PM +0900, Masami Hiramatsu wrote:
> > > I'm not sure this point, you mean 1 fd == 1 event model?
> > >
> > Yeah, I like the idea of not having an fd per event.
>
> Ah, OK. I misunderstood the idea.
> per-FD model sounds like having events/user-events/*/marker file.
>
Thanks for the back and forth, I appreciate your time on this.

Yes, in my mind there are two options to avoid kernel memory usage
per-event.

1.
We have a an array per file struct that is independently ref-counted.
This is required to ensure lifetime requirements and to ensure user code
cannot access other user events that might have been free'd outside of
the lifetime and cause a kernel crash.

This approach also requires 2 int's to be returned, 1 for the status
page the other a local index for the write into the above array per-file
struct.

This is likely the most complex method due to it's lifetime and RCU
synchronization requirements. However, it represents the least memory to
both kernel and user space.

2.
We have a anon_inode FD that gets installed into the user process and
returned via the ioctl from user_events tracefs file. The file struct
backing the FD is shared by all user mode processes for that event. Like
having an inject/marker file per-event in the user_events subsystem.

This approach requires an FD returned and either an int for the status
page or the returend FD could expose the ID via another IOCTL being
issued.

This is the simplest method since the FD manages the lifetime, when FD
is released so is the shared file struct. Kernel side memory is reduced
to only unique events that are actively being used. There is no RCU or
synchronization beyond the FD lifetime. The user mode processes does
incur an FD per-event within their file description table. So they
events charge against their FD per-process limit (not necessarily a bad
thing).

This also seems to follow the pre-existing patterns of tracefs
(trace_marker, inject, format, etc all have a shared file available to
user-processes that have been granted access). For our case, we want
that, but we want it on a access boundary to who all have access to the
user_events_* tracefs files. We don't want to open up all of tracefs
widely.

> > I want to make
> > sure the complexity is worth it. Is the overhead of an FD per event in
> > user space too much?
>
> It depends on the use case, how much events you wants to use with
> the user-events. If there are hundreds of the evets, that will consume
> kernel resources and /proc/*/fd/ will be filled with the event's fds.
> But if there is a few events, I think no problem.
>
In our own use case this will be low due to the way we plan to use the
events. However, I am not sure others will follow that :)

Thanks,
-Beau