Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

From: Alexey Budankov
Date: Wed Jan 15 2020 - 00:15:23 EST



On 15.01.2020 4:52, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
> <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
>>
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
>>
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
>
> yes. the current patches 2 and 5 look good to me.

Thanks. I appreciate your cooperation.

> I would only change patch 1 to what Andy was proposing earlier:

Could you please share the link to the proposal to get more details?
In this patch set discussion there was only this [1] on more generic
naming of PERFMON cap from Andi Kleen.

>
> static inline bool perfmon_capable(void)
> {
> if (capable_noaudit(CAP_PERFMON))
> return capable(CAP_PERFMON);
> if (capable_noaudit(CAP_SYS_ADMIN))
> return capable(CAP_SYS_ADMIN);
>
> return capable(CAP_PERFMON);
> }

Yes, this makes sense and adds up.

> I think Andy was trying to preserve the order of audit events.
>
> I'm also suggesting to drop SYS from the cap name. It doesn't add any value
> to the name.

Agreed, CAP_PERFMON sounds more generic, as it actually is.

Gratefully,
Alexey

[1] https://lore.kernel.org/lkml/20191211203648.GA862919@xxxxxxxxxxxxxxxxxxxx/