Re: [PATCH] user_events: Enable user processes to create and write to trace events

From: Steven Rostedt
Date: Tue Oct 12 2021 - 21:18:59 EST


On Mon, 11 Oct 2021 09:25:23 -0700
Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Yes, in my mind there are two options to avoid kernel memory usage
> per-event.
>
> 1.
> We have a an array per file struct that is independently ref-counted.
> This is required to ensure lifetime requirements and to ensure user code
> cannot access other user events that might have been free'd outside of
> the lifetime and cause a kernel crash.
>
> This approach also requires 2 int's to be returned, 1 for the status
> page the other a local index for the write into the above array per-file
> struct.
>
> This is likely the most complex method due to it's lifetime and RCU
> synchronization requirements. However, it represents the least memory to
> both kernel and user space.

Does it require RCU synchronization as the updates only happen from
user space. But is this for the writing of the event? You want a
separate fd for each event to write to, instead of saying you have
another interface to write and just pass the given id?

>
> 2.
> We have a anon_inode FD that gets installed into the user process and
> returned via the ioctl from user_events tracefs file. The file struct
> backing the FD is shared by all user mode processes for that event. Like
> having an inject/marker file per-event in the user_events subsystem.
>
> This approach requires an FD returned and either an int for the status
> page or the returend FD could expose the ID via another IOCTL being
> issued.
>
> This is the simplest method since the FD manages the lifetime, when FD
> is released so is the shared file struct. Kernel side memory is reduced
> to only unique events that are actively being used. There is no RCU or
> synchronization beyond the FD lifetime. The user mode processes does
> incur an FD per-event within their file description table. So they
> events charge against their FD per-process limit (not necessarily a bad
> thing).
>
> This also seems to follow the pre-existing patterns of tracefs
> (trace_marker, inject, format, etc all have a shared file available to
> user-processes that have been granted access). For our case, we want
> that, but we want it on a access boundary to who all have access to the
> user_events_* tracefs files. We don't want to open up all of tracefs
> widely.
>
> > > I want to make
> > > sure the complexity is worth it. Is the overhead of an FD per event in
> > > user space too much?
> >
> > It depends on the use case, how much events you wants to use with
> > the user-events. If there are hundreds of the evets, that will consume
> > kernel resources and /proc/*/fd/ will be filled with the event's fds.
> > But if there is a few events, I think no problem.
> >
> In our own use case this will be low due to the way we plan to use the
> events. However, I am not sure others will follow that :)

I will say, whenever we say this will only have a "few", if it becomes
useful, it will end up having many.

-- Steve