Re: [PATCH 2/2] perf: Userspace software event and ioctl

From: Frederic Weisbecker
Date: Sat Sep 27 2014 - 13:14:33 EST


2014-09-25 20:33 GMT+02:00 Ingo Molnar <mingo@xxxxxxxxxx>:
>
> * Pawel Moll <pawel.moll@xxxxxxx> wrote:
>
>> On Wed, 2014-09-24 at 08:49 +0100, Ingo Molnar wrote:
>> > * Pawel Moll <pawel.moll@xxxxxxx> wrote:
>> >
>> > > On Thu, 2014-09-18 at 15:34 +0100, Pawel Moll wrote:
>> > > > This patch adds a PERF_COUNT_SW_USERSPACE_EVENT type,
>> > > > which can be generated by user with PERF_EVENT_IOC_ENTRY
>> > > > ioctl command, which injects an event of said type into
>> > > > the perf buffer.
>> > >
>> > > It occurred to me last night that currently perf doesn't handle "write"
>> > > syscall at all, while this seems like the most natural way of
>> > > "injecting" userspace events into perf buffer.
>> > >
>> > > An ioctl would still be needed to set a type of the following events,
>> > > something like:
>> > >
>> > > ioctl(SET_TYPE, 0x42);
>> > > write(perf_fd, binaryblob, size);
>> > > ioctl(SET_TYPE, 0);
>> > > dprintf(perf_fd, "String");
>> > >
>> > > which is fine for use cases when the type doesn't change often,
>> > > but would double the amount of syscalls when every single event
>> > > is of a different type. Perhaps there still should be a
>> > > "generating ioctl" taking both type and data/size in one go?
>> >
>> > Absolutely, there should be a single syscall.
>>
>> Yeah, it's my gut feeling as well. I just wonder if we still want to
>> keep write() handler for operations on perf fds? This seems natural -
>> takes data buffer and its size. The only issue is the type.
>>
>> > I'd even argue it should be a new prctl(): that way we could both
>> > generate user events for specific perf fds, but also into any
>> > currently active context (that allows just generation/injection
>> > of user events). In the latter case we might have no fd to work
>> > off from.
>>
>> When Arnaldo suggested that the "user events" could be used by perf
>> trace, it was exactly my first thought. I just didn't have answer how to
>> present it to the user (an extra syscall didn't seem like a good idea),
>> but prctl seems interesting, something like this?
>>
>> prctl(PR_TRACE_UEVENT, type, size, data, 0);
>
> Exactly!
>
>> How would we select tasks that can write to a given buffer? Maybe an
>> ioctl() on a perf fd? Something like this?
>>
>> ioctl(perf_fd, PERF_EVENT_IOC_ENABLE_UEVENT, pid);
>> ioctl(perf_fd, PERF_EVENT_IOC_DISABLE_UEVENT, pid);
>
> No, I think there's a simpler way: this should be a regular
> perf_attr flag, which defaults to '0' (tasks cannot do this), but
> which can be set to 1 if the profiler explicitly allows such
> event injection.

Maybe we just don't even need any permission at all. Which harm can
that do if this only ever generate events to those interested in the
relevant perf context? It could be a simple tracepoint BTW.

Oh and I really like the fact we don't use a syscall that requires an
fd. The tracee really shouldn't be aware of the tracer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/