Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo
Date: Tue Apr 07 2020 - 13:23:46 EST
Em Tue, Apr 07, 2020 at 01:56:43PM -0300, Arnaldo Carvalho de Melo escreveu:
>
> But then, even with that attr.exclude_kernel set to 1 we _still_ get
> kernel samples, which looks like another bug, now trying with strace,
> which leads us to another rabbit hole:
>
> [perf@five ~]$ strace -e perf_event_open -o /tmp/out.put perf top --stdio
> Error:
> You may not have permission to collect system-wide stats.
>
> Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> which controls use of the performance events system by
> unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).
>
> The current value is 2:
>
> -1: Allow use of (almost) all events by all users
> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN
> Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN
> >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN
> >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN
>
> To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
>
> kernel.perf_event_paranoid = -1
>
> [perf@five ~]$
>
> If I remove that strace -e ... from the front, 'perf top' is back
> working as a non-cap_sys_admin user, just with cap_perfmon.
>
So I couldn't figure it out so far why is that exclude_kernel is being
set to 1, as perf-top when no event is passed defaults to this to find
out what to use as a default event:
perf_evlist__add_default(top.evlist)
perf_evsel__new_cycles(true);
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.exclude_kernel = !perf_event_can_profile_kernel(),
};
perf_event_paranoid_check(1);
return perf_cap__capable(CAP_SYS_ADMIN) ||
perf_cap__capable(CAP_PERFMON) ||
perf_event_paranoid() <= max_level;
And then that second condition should hold true, it returns true, and
then .exclude_kernel should be set to !true -> zero.o
Now the wallclock says I need to stop being a programmer and turn into a
daycare provider for Pedro, cya!
- Arnaldo