Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches

From: Adrian Hunter
Date: Fri Jun 12 2015 - 07:14:41 EST


On 11/06/15 17:15, Peter Zijlstra wrote:
> On Tue, Jun 09, 2015 at 05:21:10PM +0300, Adrian Hunter wrote:
>> Tracepoints are no good at all for non-privileged users
>> because they need either CAP_SYS_ADMIN or
>> /proc/sys/kernel/perf_event_paranoid <= -1.
>>
>> On the other hand, kernel software events need either
>> CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.
>
> So while I think it makes sense to allow some tracepoint outside of that
> priv level, IOW have a per tracepoint priv level filter thingy, I don't
> think sched_switch() is one of those because it explicitly exposes
> timing information on other tasks.
>
>> This new PERF_RECORD_SWITCH event does not have those problems
>> and it also has a couple of other small advantages. It is
>> easier to use because it is an auxiliary event (like mmap,
>> comm and task events) which can be enabled by setting a single
>> bit. It is smaller than sched:sched_switch and easier to parse.
>
> Right, so the one wee problem I have is that this only provides sched_in
> data, I imagine people might be interested in sched_out as well.

That is not a problem although it would be interesting to know the use-case.
To me it seemed unreasonable to expect to analyze scheduler behaviour
without admin-level privileges since it is inherently an administrative
activity.

>
> Typically the switch even provides prev and next and thereby is
> complete, but since we're limiting it to the one specific task, we'll
> not have the sched_out data.

That makes sense for completeness, but as I wrote, it would be interesting
to know what someone might actually use that for.

>
>> @@ -812,6 +813,18 @@ enum perf_event_type {
>> */
>> PERF_RECORD_ITRACE_START = 12,
>>
>> + /*
>> + *
>> + *
>> + * struct {
>> + * struct perf_event_header header;
>> + * u32 pid, tid;
>> + * u64 time;
>
> all 3 are already part of sample_id.

You have to decide whether you expect to be able to use an event without
sample_id. MMAP and MMAP2 both have pid, tid which are in sample_id, LOST
has id, EXIT and FORK have time, all of the THROTTLE/UNTHROTTLE members are
in sample_id etc. So it currently looks like we expect to be able to use an
event without requiring sample_id.

It doesn't affect my case either way because perf tools always sets
sample_id_all if it can.

>
>> + * struct sample_id sample_id;
>> + * };
>> + */
>> + PERF_RECORD_SWITCH = 13,
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/