Re: [PATCH v2 09/18] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep

From: Peter Newman
Date: Fri Mar 10 2023 - 04:31:53 EST


Hi James,

On Thu, Mar 9, 2023 at 6:35 PM James Morse <james.morse@xxxxxxx> wrote:
> On 09/03/2023 13:41, Peter Newman wrote:
> > However, I do want to be sure that MBWU counters will never be silently
> > deallocated because we will never be able to trust the data unless we
> > know that the counter has been watching the group's tasks for the
> > entirety of the measurement window.
>
> Absolutely.
>
> The MPAM driver requires the number of monitors to match the value of
> resctrl_arch_system_num_rmid_idx(), otherwise 'mbm_local' won't be offered via resctrl.
> (see class_has_usable_mbwu() in [0])
>
> If the files exist in resctrl, then a monitor was reserved for this PARTID+PMG, and won't
> get allocated for anything else.
>
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/v6.2&id=f28d3fefdcf7022a49f62752acbecf180ea7d32f
>
>
> > Unlike on AMD, MPAM allows software to control which PARTID+PMG the
> > monitoring hardware is watching. Could we instead make the user
> > explicitly request the mbm_{total,local}_bytes events be allocated to
> > monitoring groups after creating them? Or even just allocating the
> > events on monitoring group creation only when they're available could
> > also be marginably usable if a single user agent is managing rdtgroups.
>
> Hmmmm, what would that look like to user-space?
>
> I'm against inventing anything new here until there is feature-parity where possible
> upstream. It's a walk, then run kind of thing.
>
> I worry that extra steps to setup the monitoring on MPAM:resctrl will be missing or broken
> in many (all?) software projects if they're not also required on Intel:resctrl.
>
> My plan for hardware with insufficient counters is to make the counters accessible via
> perf, and do that in a way that works on Intel and AMD too.

In the interest of enabling MPAM functionality, I think the low-effort
approach is to only allocate an MBWU monitor to a newly-created MON or
CTRL_MON group if one is available. On Intel and AMD, the resources are
simply always available.

The downside on monitor-poor (or PARTID-rich) hardware is the user gets
maximually-featureful monitoring groups first, whether they want them or
not, but I think it's workable. Perhaps in a later change we can make an
interface to prevent monitors from being allocated to new groups or one
to release them when they're not needed after group creation.

At least in this approach there's still a way to use MBWU with resctrl
when systems have more PARTIDs than monitors.

This also seems like less work than making resctrl able to interface
with the perf subsystem.

-Peter