Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

From: Alexei Starovoitov
Date: Tue Jan 14 2020 - 20:52:27 EST


On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
<alexey.budankov@xxxxxxxxxxxxxxx> wrote:
>
>
> On 14.01.2020 21:06, Alexei Starovoitov wrote:
> > On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> > <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >>>> to open it for enabling/disabling kprobes, not for creation.
> >>>>
> >>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >>>> it should check the max number of probes to be created by something like
> >>>> ulimit)
> >>>> I think nowadays we have fixed all such kernel crash problems on x86,
> >>>> but not sure for other archs, especially on the devices I can not reach.
> >>>> I need more help to stabilize it.
> >>>
> >>> I don't see how enable/disable is any safer than creation.
> >>> If there are kernel bugs in kprobes the kernel will crash anyway.
> >>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> >>> CAP_* is about delegation of root privileges to non-root.
> >>> Delegating some of it is ok, but disallowing creation makes it useless
> >>> for bpf tracing, so we would need to add another CAP later.
> >>> Hence I suggest to do it right away instead of breaking
> >>> sys_perf_even_open() access into two CAPs.
> >>>
> >>
> >> Alexei, Masami,
> >>
> >> Thanks for your meaningful input.
> >> If we know in advance that it still can crash the system in some cases and on
> >> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> >> such delegation looks premature until the crashes are avoided. So it looks like
> >> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> >> a separate patch set.
> >
> > perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>
> Sure, software cannot guarantee, but known software bugs could still be fixed,
> that's what I meant.
>
> > imo adding a cap just for pmc is pointless.
> > if you add a new cap it should cover all of sys_perf_event_open syscall.
> > subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> > be the source of ongoing confusion. nack to such cap.
> >
>
> Well, as this patch set already covers complete perf_event_open functionality,
> and also eBPF related parts too, could you please review and comment on it?
> Does the patches 2/9 and 5/9 already bring all required extentions?

yes. the current patches 2 and 5 look good to me.
I would only change patch 1 to what Andy was proposing earlier:

static inline bool perfmon_capable(void)
{
if (capable_noaudit(CAP_PERFMON))
return capable(CAP_PERFMON);
if (capable_noaudit(CAP_SYS_ADMIN))
return capable(CAP_SYS_ADMIN);

return capable(CAP_PERFMON);
}
I think Andy was trying to preserve the order of audit events.

I'm also suggesting to drop SYS from the cap name. It doesn't add any value
to the name.