Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

From: Alexey Budankov
Date: Tue Jan 14 2020 - 04:47:54 EST



On 14.01.2020 8:17, Alexei Starovoitov wrote:
> On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>>
>> On Sat, 11 Jan 2020 12:57:18 +0300
>> Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
>>
>>>
>>> On 11.01.2020 3:35, arnaldo.melo@xxxxxxxxx wrote:
>>
>>>> Message-ID: <A7F0BF73-9189-44BA-9264-C88F2F51CBF3@xxxxxxxxxx>
>>>>
>>>> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <songliubraving@xxxxxx> wrote:
>>>>>
>>>>>
>>>>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <mhiramat@xxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>>>>> Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>>>>> <peterz@xxxxxxxxxxxxx> wrote:
>>>>>>>>> Again, this only allows attaching to previously created kprobes,
>>>>> it does
>>>>>>>>> not allow creating kprobes, right?
>>>>>>>
>>>>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>>>>> kprobes.
>>>>>>>
>>>>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>>>>> creating kprobes.
>>>>>>>
>>>>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>>>>> interface to
>>>>>>>> define new kprobe events, and those events are treated as
>>>>> completely same as
>>>>>>>> tracepoint events. On the other hand, ebpf tries to define new
>>>>> probe event
>>>>>>>> via perf_event interface. Above one is that interface. IOW, it
>>>>> creates new kprobe.
>>>>>>>
>>>>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>>>>> interface for creating kprobes/uprobes?
>>>>>>
>>>>>> Would you mean perf probe to switch to perf_event_open()?
>>>>>> No, perf probe is for setting up the ftrace probe events. I think we
>>>>> can add an
>>>>>> option to use perf_event_open(). But current kprobe creation from
>>>>> perf_event_open()
>>>>>> is separated from ftrace by design.
>>>>>
>>>>> I guess we can extend event parser to understand kprobe directly.
>>>>> Instead of
>>>>>
>>>>> perf probe kernel_func
>>>>> perf stat/record -e probe:kernel_func ...
>>>>>
>>>>> We can just do
>>>>>
>>>>> perf stat/record -e kprobe:kernel_func ...
>>>>
>>>>
>>>> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
>>>
>>> Arnaldo, Masami, Song,
>>>
>>> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
>>> Could you please also review and comment on patch 5/9 for bpf_trace.c?
>>
>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>> to open it for enabling/disabling kprobes, not for creation.
>>
>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>> it should check the max number of probes to be created by something like
>> ulimit)
>> I think nowadays we have fixed all such kernel crash problems on x86,
>> but not sure for other archs, especially on the devices I can not reach.
>> I need more help to stabilize it.
>
> I don't see how enable/disable is any safer than creation.
> If there are kernel bugs in kprobes the kernel will crash anyway.
> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> CAP_* is about delegation of root privileges to non-root.
> Delegating some of it is ok, but disallowing creation makes it useless
> for bpf tracing, so we would need to add another CAP later.
> Hence I suggest to do it right away instead of breaking
> sys_perf_even_open() access into two CAPs.
>

Alexei, Masami,

Thanks for your meaningful input.
If we know in advance that it still can crash the system in some cases and on
some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
such delegation looks premature until the crashes are avoided. So it looks like
access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
a separate patch set.

Thanks,
Alexey