Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)
From: Alexey Budankov
Date: Thu Oct 04 2018 - 13:14:52 EST
Hi,
On 03.10.2018 20:01, Jann Horn wrote:
> On Mon, Oct 1, 2018 at 10:53 PM Alexey Budankov
> <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
<SNIP>
>> 3. Every time an event for ${PMU} is created over perf_event_open():
>> a) the calling thread's euid is checked to belong to ${PMU}_users group
>> and if it does then the event's fd is allocated;
>> b) then traditional checks against perf_event_pranoid content are applied;
>> c) if the file doesn't exist the access is governed by global setting
>> at /proc/sys/kernel/perf_even_paranoid;
>
> You'll also have to make sure that this thing in kernel/events/core.c
> doesn't have any bad effect:
>
> /*
> * Special case software events and allow them to be part of
> * any hardware group.
> */
>
> As in, make sure that you can't smuggle in arbitrary software events
> by attaching them to a whitelisted hardware event.
Yes, makes sense. Please see and comment below.
>
<SNIP>
>> Security analysis for uncore IMC, QPI/UPI, PCIe PMUs is still required
>> to be enabled for fine grain control.
>
> And you can't whitelist anything that permits using sampling events
> with arbitrary sample_type.
>
It appears that there is a dependency on the significance of data that PMUs captures
for later analysis. Currently there are following options for data being captured
(please correct or extend if something is missing from the list below):
1) Monitored process details:
- system information on a process as a container (of threads, memory data and
IDs (e.g. open fds) from process specific namespaces and etc.);
- system information on threads as containers (of execution context details);
2) Execution context details:
- memory addresses;
- memory data;
- calculation results;
- calculation state in HW;
3) Monitored process and execution context telemetry data, used for building
various performance metrics and can come from:
- user mode code and OS kernel;
- various parts of HW e.g. core, uncore, peripheral and etc.
Group 2) is the potential leakage source of sensitive process data so if a PMU,
at some mode, samples execution context details then the PMU, working in that mode,
is the subject for *access* and *scope* control.
On the other hand if captured data contain only the monitored process details
and/or associated execution telemetry, there is probably no sensitive data leakage
thru that captured data.
For example, if cpu PMU samples PC addresses overtime, e.g. for providing
hotspots-by-function profile, then this requires to be controlled as from access as
from scope perspective, because PC addresses is execution context details that
can contain sensitive data.
However, if cpu PMU does counting of some metric value, or if software PMU reads
value of thread active time from the OS, possibly overtime, for later building some
rating profile, or reading of some HW counter value without attribution to any
execution context details, that is probably not that risky as in the case of
PC address sampling.
Uncore PMUs e.g. memory controller (IMC), interconnect (QPI/UPI) and peripheral (PCIe)
currently only read counters values that are captured system wide by HW, and provide
no attribution to any specific execution context details, thus, sensitive process data.
Based on that,
A) paranoid knob is required for a PMU if it can capture data from group 2)
B) paranoid knob limits scope of capturing sensitive data:
-3 - *scope* is defined by some high level setting
-2 - disabled - no allowed *scope*
-1 - no restrictions - max *scope*
0 - system wide
1 - process user and kernel space
2 - process user space only
C) paranoid knob has to be checked every time the PMU is going to start
capturing sensitive data to avoid capturing beyond the allowed scope.
PMU *access* semantics is derived from fs ACLs and could look like this:
r - read PMU architectural and configuration details, read PMU *access* settings
w - modify PMU *access* settings
x - modify PMU configuration and collect data
So levels of *access* to PMU could look like this:
root=rwx, ${PMU}_users=r-x, other=r--.
Possible examples of *scope* control settings could look like this:
1) system wide user+kernel mode CPU sampling with context switches
and uncore counting:
/proc/sys/kernel/perf_event_paranoid (-2, 2): 0
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3
2) per-process CPU sampling with context switches and uncore counting:
/proc/sys/kernel/perf_event_paranoid (-2, 2): 1|2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1
3) per-process user mode CPU sampling allowed to specific ${PMU}_groups only:
/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): 2
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): 2
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3
4) uncore HW counters monitoring, possibly overtime:
/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1
Please share more thought so that it eventually could go into
Documentation/admin-guide/perf-security.rst.
Thanks,
Alexey