[RFC PATCH 0/2] Resctrl - rewrite (WIP)

From: Tony Luck
Date: Mon Jun 19 2023 - 23:37:17 EST


Back in April I posted some RFC patches that added a "driver
registration" interface to the core resctrl code so that additional
resource control and monitor features could be added without further
complicating the core code. Link to that discussion:

https://lore.kernel.org/all/20230420220636.53527-1-tony.luck@xxxxxxxxx/

Reinette gave the feedback that it would be better to base the module
registration on the resctrl resource structure. Reinette also pointed
me to work from James Morse, and some additional discussion happened
here:

https://lore.kernel.org/all/ZG%2FMZVrWYrCHm%2Ffr@agluck-desk3/

James provided details on where ARM's MPAM has similarities and
differences from the Intel Resource Director Technology and AMD's
similar implementation. Drew Fustini was also pulled into that
conversation to comment on RISC-V CBQRI.

>From those discussions I believed we need a do-over on the core
/sys/fs/resctrl implementation to make it friendlier for architecural
variations. Here's what I have so far.

=========================================================================
| N.B. This is a general direction check. There are many obvious |
| rough edges (e.g. some careful thought needs to happen on locking |
| for the files in /sys/fs/resctrl that are "owned" by modules that |
| can be unloaded). I'm mostly looking for feedback from AMD, ARM and |
| RISCV on whether this is a foundation to build on, whether some small |
| tweaks could make it better, or if this is still going to be really |
| hard for architectures that have radical divergence from the Intel |
| model. |
=========================================================================

First patch is my attempt at architecture neutral code. All mention
of "RDT", "CLOSID" and "RMID" have been expunged. When creating a
new group this code calls arch_alloc_resctrl_ids() to allocate an
opaque "resctrl_ids" value.

Q: I made this a "u64" because that neatly allows storage of both an
x86 CLOSID and RMID (in a handy representation that matches the bit
layout of the Intel IA32_PQR_ASSOC model specific register). If other
architectures need something more complex it could be a "typedef
resctrl_id_t" ... there are a couple of places where we would need
a comparison function.

I broke the code into several source files that handle different
sub-functions of core code to make it easier to navigate. Much of
the code here should look familiar as I did a lot of
s/rdtgroup/resctrl_group/ on functions from the original resctrl
code.

By itself the core code is useless. Cannot even be built as the
controlling Kconfig option "CONFIG_RESCTRL2_FS" must be invoked by
a "select" request from architecture specific code that provides
the necessary "arch_*()" functions to make everything work.

Module registration is handled in fs/resctrl2/resources.c and
can be done before or after mounting /sys/fs/resctrl. Current
code won't let you make any new resource groups until a module
implementing a control function is loaded to supply the information
on how many groups the architecture supports.

Second patch is all the Intel X86 code (with some of the AMD bits
included, but by no means all of them).

I've implemented modules for most of the legacy Intel control
and monitor functions. Many of these share common code (by means
of a symlinked source file ... I couldn't figure out how to make
Kconfig build both rdt_l3_cat.ko and rdt_l3_cdp.ko from the same
source file with a different set of $(CFLAGS)).

Users can pick which features they want by loading modules that
implement the bits they want. E.g. CDP is enabled by loading
that rdt_l3_cdp.ko module instead of rdt_l3_cat.ko (there's some
code to prevent both being loaded together).

I started on the hooks for the "mba_MBps" feedback from MBM driver,
but in this code drop I just have a simple module that reports the
bandwidth for each group instead of the byte count. I just need to
create a module that has both MBA control and MBM monitoring resources
with a periodic comparison of actual bandwidth with desired, that
then tweaks the MBA controls up/down as needed.

I haven't ventured to read all the pseudo-locking code, but it looks
as though providing the driver with a way to tell core code that a
group is exclusive instead of shared (which tells core code not to
allow assignment of tasks or CPUs to the group) may be all the
surgery needed to core code. The x86 module will be more complex
that the toys I've produced so far, but should be able to leverage
much from the existing resctrl implementation.


Tony Luck (2):
resctrl2: Add all the generic code
resctrl2: Arch x86 modules for most of the legacy control/monitor
functions

include/linux/resctrl.h | 107 +++++
include/linux/sched.h | 3 +
arch/x86/include/asm/resctrl.h | 38 ++
fs/resctrl2/arch/x86/rdt.h | 22 +
fs/resctrl2/internal.h | 110 +++++
arch/x86/kernel/cpu/amd.c | 3 +
arch/x86/kernel/cpu/intel.c | 3 +
arch/x86/kernel/process_32.c | 1 +
arch/x86/kernel/process_64.c | 3 +
fs/resctrl2/arch/x86/alloc.c | 119 +++++
fs/resctrl2/arch/x86/rdt_l2_cat.c | 1 +
fs/resctrl2/arch/x86/rdt_l2_cdp.c | 1 +
fs/resctrl2/arch/x86/rdt_l3_cat.c | 349 +++++++++++++++
fs/resctrl2/arch/x86/rdt_l3_cdp.c | 1 +
fs/resctrl2/arch/x86/rdt_l3_mba.c | 251 +++++++++++
fs/resctrl2/arch/x86/rdt_llc_occupancy.c | 100 +++++
fs/resctrl2/arch/x86/rdt_mbm_adjust.c | 91 ++++
fs/resctrl2/arch/x86/rdt_mbm_local_bytes.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_local_rate.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_total_bytes.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_total_rate.c | 1 +
fs/resctrl2/arch/x86/rdt_monitor.c | 491 +++++++++++++++++++++
fs/resctrl2/cpu.c | 315 +++++++++++++
fs/resctrl2/directory.c | 295 +++++++++++++
fs/resctrl2/domain.c | 99 +++++
fs/resctrl2/info.c | 99 +++++
fs/resctrl2/kernfs.c | 58 +++
fs/resctrl2/locking.c | 52 +++
fs/resctrl2/resources.c | 85 ++++
fs/resctrl2/root.c | 173 ++++++++
fs/resctrl2/schemata.c | 110 +++++
fs/resctrl2/tasks.c | 193 ++++++++
arch/x86/Kconfig | 81 +++-
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/resctrl2/Kconfig | 5 +
fs/resctrl2/Makefile | 14 +
fs/resctrl2/arch/x86/Makefile | 29 ++
38 files changed, 3306 insertions(+), 2 deletions(-)
create mode 100644 fs/resctrl2/arch/x86/rdt.h
create mode 100644 fs/resctrl2/internal.h
create mode 100644 fs/resctrl2/arch/x86/alloc.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l2_cat.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l2_cdp.c
create mode 100644 fs/resctrl2/arch/x86/rdt_l3_cat.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l3_cdp.c
create mode 100644 fs/resctrl2/arch/x86/rdt_l3_mba.c
create mode 100644 fs/resctrl2/arch/x86/rdt_llc_occupancy.c
create mode 100644 fs/resctrl2/arch/x86/rdt_mbm_adjust.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_local_bytes.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_local_rate.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_total_bytes.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_total_rate.c
create mode 100644 fs/resctrl2/arch/x86/rdt_monitor.c
create mode 100644 fs/resctrl2/cpu.c
create mode 100644 fs/resctrl2/directory.c
create mode 100644 fs/resctrl2/domain.c
create mode 100644 fs/resctrl2/info.c
create mode 100644 fs/resctrl2/kernfs.c
create mode 100644 fs/resctrl2/locking.c
create mode 100644 fs/resctrl2/resources.c
create mode 100644 fs/resctrl2/root.c
create mode 100644 fs/resctrl2/schemata.c
create mode 100644 fs/resctrl2/tasks.c
create mode 100644 fs/resctrl2/Kconfig
create mode 100644 fs/resctrl2/Makefile
create mode 100644 fs/resctrl2/arch/x86/Makefile


base-commit: 45a3e24f65e90a047bef86f927ebdc4c710edaa1
--
2.40.1