[PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support
From: Drew Fustini
Date: Sun May 24 2026 - 19:58:21 EST
This RFC series adds RISC-V QoS support: the Ssqosid extension [1]
(srmcfg CSR), the CBQRI controller interface [2] integrated with
resctrl [3], and ACPI RQSC [4] for controller discovery. DT support
is possible but no platform drivers are included. The series is
also available as a branch [5].
QEMU support for Ssqosid and CBQRI lives in [6], with ACPI RQSC as
a follow-on series [7]. There is also a combined branch [8].
Series organization
-------------------
01 DT binding for Ssqosid extension
02-03 Ssqosid ISA support (detection, srmcfg CSR, switch_to)
04-06 fs/resctrl helpers and resource type additions
07-10 CBQRI device ops (cbqri_devices.c): capacity probe +
allocation, capacity monitoring, bandwidth probe +
allocation, bandwidth monitoring
11-15 CBQRI resctrl integration (cbqri_resctrl.c): cache
allocation, L3 cache occupancy monitoring, MB_MIN
bandwidth allocation, MB_WGHT bandwidth allocation,
mbm_total_bytes monitoring
16-17 ACPI RQSC parser and init
18 Enable resctrl filesystem for Ssqosid (Kconfig)
Refer to the v3 cover letter [9] for the test setup including the
reference SoC layout and the corresponding QEMU command line.
[1] https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
[2] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
[3] https://docs.kernel.org/filesystems/resctrl.html
[4] https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
[5] https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux.git/log/?h=b4/ssqosid-cbqri-rqsc
[6] https://lore.kernel.org/qemu-devel/20260105-riscv-ssqosid-cbqri-v4-0-9ad7671dde78@xxxxxxxxxx/
[7] https://lore.kernel.org/qemu-devel/20260202-riscv-rqsc-v1-0-dcf448a3ed73@xxxxxxxxxx/
[8] https://github.com/tt-fustini/qemu/tree/b4/riscv-rqsc
[9] https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@xxxxxxxxxx
Key design decisions
--------------------
- Create new resource types as RDT_RESOURCE_MBA cannot represent the
semantics of the CBQRI bandwidth controllers:
- RDT_RESOURCE_MB_MIN matches CBQRI Rbwb (reserved bandwidth
blocks). The sum of Rbwb across all control groups must be
<= MRBWB (maximum number of reserved bandwidth blocks).
- RDT_RESOURCE_MB_WGHT matches CBQRI Mweight, the weighted share of
the remaining bandwidth blocks. Values are in [0, 255]: 0 disables
work-conserving sharing for the group, 1..255 compete for the
leftover pool.
- mbm_total_bytes is supported only when the platform exposes exactly
one mon-capable bandwidth controller and exactly one L3 domain.
Pairing a single BC across multiple L3 domains would let standard
userspace tools overcount system bandwidth by summing the same
counter across domains.
Open issues
-----------
- RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT are intended to drive
discussion, not as the final solution. I plan to rebase onto
Reinette's proof of concept once it is posted.
- resctrl monitoring scope limitations:
- monitor-only L3 capacity controllers are not supported.
- CBQRI capacity controllers can monitor any cache level, but resctrl
only supports occupancy on L3.
- resctrl needs to gain a non-CPU scope level for mbm_total_bytes
to be supported on platforms with multiple bandwidth controllers
or multiple L3 domains.
- When a control group is freed, rbwb_cache[closid] is not reset,
so the MB_MIN sum check can count the stale reservation against
MRBWB. Fixing this requires a new resctrl_arch_* callback in
fs/resctrl invoked on group destroy, which is out of scope for
this arch-driver series.
- cc_cunits is not supported. cc_block_mask maps well onto resctrl's
existing CBM schema, but there is no existing equivalent for
capacity units.
- RQSC structs live in drivers/acpi/riscv/rqsc.h until the spec is
ratified and the ACPICA upstream submission lands. They will then move
to include/acpi/actbl2.h. The spec is in the final phase
before ratification.
Changes in v5:
--------------
The changes in this revision are based on the feedback in the Sashiko
review of the series.
Ssqosid:
- Seed cpu_srmcfg to U32_MAX in DEFINE_PER_CPU so early-boot context
switches always write the CSR rather than matching a zero-initialised
cache before riscv_srmcfg_init() runs.
- __switch_to_srmcfg() evaluates RCID and MCID against
cpu_srmcfg_default independently. A task in the default RCID group
with a specific MCID previously bypassed the CPU default.
- Register a CPU PM notifier that invalidates cpu_srmcfg on
CPU_PM_EXIT / CPU_PM_ENTER_FAILED so resume-from-suspend on the boot
CPU writes the CSR.
- Drop the for_each_online_cpu pre-seed loop in riscv_srmcfg_init().
cpuhp_setup_state() already covers already-online CPUs.
CBQRI:
- Add mweight_cache. cbqri_apply_bc_field() seeds both fields of
bc_bw_alloc from the software caches, so that stale data can not leak
into the unmodified field.
- Seed mweight_cache to FIELD_MAX(MWEIGHT_MASK) at probe so the first
MB_MIN domain init does not commit Mweight=0 to every RCID. A weight
of 0 is a hard cap on opportunistic bandwidth, which would starve
every RCID until the subsequent MB_WGHT domain init catches up.
- cbqri_apply_mweight_config() rejects mweight > WEIGHT_MASK at entry
rather than letting it truncate and trigger a verify mismatch.
- cbqri_apply_bc_field() updates per-RCID cache only after verifying.
- cbqri_controller_destroy() now iounmaps and releases the mem region
from rollback paths, gated on ctrl->base.
- cbqri_probe_feature() clears OP, AT, RCID and EVT_ID on every write,
so the probe never writes stale bits into the register.
- cbqri_apply_cache_config() clears cc_block_mask before the initial
READ_LIMIT that captures saved_cbm.
- Drop the ctrl->faulted early return from controller ops.
- Reject a second bandwidth controller when sharing a proximity domain.
- Rejects ctrl->rcid_count > SRMCFG_RCID_MASK so the schedule-in
fast path cannot silently truncate the RCID.
- Widen CBQRI_MON_CTL_OP/MCID/EVT_ID masks to GENMASK_ULL so
FIELD_MODIFY on a u64 register stays safe if RV32 support is added.
resctrl:
- Switch the L3 mon_domain teardown paths from cancel_delayed_work_sync
to cancel_delayed_work to avoid potential deadlock.
- Guard the mbm_over cancel on QOS_L3_MBM_TOTAL_EVENT_ID, so a system
without a paired BC does not cancel a zeroed work struct.
- cbqri_attach_cpu_to_cap_ctrl() rolls back cpumask_set_cpu and any
freshly created ctrl_domain when cbqri_attach_cpu_to_l3_mon() fails.
- Restrict mbm_total_bytes to platforms with exactly one L3 domain.
- Pair the L3 mon domain with its BC and initialise the BC's
per-MCID accumulators before resctrl_online_mon_domain() exposes
the domain, so a concurrent mbm_total_bytes read cannot race with
paired_bc init.
- Hold cbqri_domain_list_lock across the MMIO paths in
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() so a
concurrent CPU hotplug detach cannot free hw_dom mid-read.
- cbqri_resctrl_setup() rolls back exposed_alloc_capable /
exposed_mon_capable on resctrl_init() failure so
resctrl_arch_*_capable() does not report stale state to callers.
- Drop the cacheinfo_ready wait queue in cbqri_resctrl_setup() and
the RCU annotations on the ctrl_domain list. cacheinfo runs at
device_initcall_sync, strictly before late_initcall, and the list
is mutated only from cpuhp callbacks under cbqri_domain_list_lock.
Kconfig:
- RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally. resctrl
is gated separately by the silent RISCV_CBQRI_RESCTRL_FS option.
ACPI:
- acpi_parse_rqsc() rejects tables with the wrong header.revision,
validates res0->type and res0->id_type, and checks that node->length
does not overrun the table end.
Sashiko review:
https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683%40kernel.org
Link to v4:
https://lore.kernel.org/all/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@xxxxxxxxxx/
Changes in v4:
--------------
resctrl:
- Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
- Add default_to_min to resctrl_membw so MB_MIN defaults to min_bw
- Add L3 cache occupancy monitoring for L3-scoped capacity controllers
- Add mbm_total_bytes bandwidth monitoring when there is a single
bandwidth controller
- Move domain creation into cpuhp callbacks so that cpu_mask reflects
only online CPUs
- resctrl_arch_reset_rmid() returns early when called with IRQs
disabled.
CBQRI:
- Replace per-controller spinlock with mutex. Each CBQRI op is a
write-then-poll-busy cycle of up to 1 ms. A sleeping mutex paired
with readq_poll_timeout() keeps preemption enabled across the
busy-wait. All resctrl-arch entry points run in process context.
- Replace struct cbqri_config with direct params in helper functions.
- max_rmid = min(max_rmid, ctrl->mcid_count) now gated on
ctrl->mon_capable.
- Validate that the sum of Rbwb does not exceed MRBWB.
- Move CDP enable state from file-scope globals to per-resource
cdp_enabled / cdp_capable.
- Configure both AT_CODE and AT_DATA limits when CDP is supported but
not enabled.
Ssqosid:
- __switch_to_srmcfg() emits RISCV_FENCE(rw, o) before and (o, rw)
after csrw to drain old-task stores and order new-task loads.
- Invalidate per-cpu cpu_srmcfg on hart online via CPUHP_AP_ONLINE_DYN.
Also seed already-online CPUs synchronously at init.
ACPI:
- Drop the PPTT helper patch and resolve cache_size via cacheinfo at
cbqri_resctrl_setup() time.
- ACPI driver now calls riscv_cbqri_register_controller() and the
cbqri_controller internals stay in cbqri_internal.h.
Refer to v3 for previous change logs:
https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@xxxxxxxxxx
---
Drew Fustini (18):
dt-bindings: riscv: Add Ssqosid extension description
riscv: detect the Ssqosid extension
riscv: add support for srmcfg CSR from Ssqosid extension
fs/resctrl: Add resctrl_is_membw() helper
fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
fs/resctrl: Let bandwidth resources default to min_bw at reset
riscv_cbqri: Add capacity controller probe and allocation device ops
riscv_cbqri: Add capacity controller monitoring device ops
riscv_cbqri: Add bandwidth controller probe and allocation device ops
riscv_cbqri: Add bandwidth controller monitoring device ops
riscv_cbqri: resctrl: Add cache allocation via capacity block mask
riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
riscv: enable resctrl filesystem for Ssqosid
.../devicetree/bindings/riscv/extensions.yaml | 6 +
MAINTAINERS | 15 +
arch/riscv/Kconfig | 20 +
arch/riscv/include/asm/acpi.h | 10 +
arch/riscv/include/asm/csr.h | 5 +
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/processor.h | 3 +
arch/riscv/include/asm/qos.h | 87 ++
arch/riscv/include/asm/resctrl.h | 152 ++
arch/riscv/include/asm/switch_to.h | 3 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/qos.c | 84 ++
drivers/acpi/riscv/Makefile | 1 +
drivers/acpi/riscv/init.c | 21 +
drivers/acpi/riscv/rqsc.c | 194 +++
drivers/acpi/riscv/rqsc.h | 63 +
drivers/resctrl/Kconfig | 32 +
drivers/resctrl/Makefile | 6 +
drivers/resctrl/cbqri_devices.c | 1100 +++++++++++++++
drivers/resctrl/cbqri_internal.h | 246 ++++
drivers/resctrl/cbqri_resctrl.c | 1458 ++++++++++++++++++++
fs/resctrl/ctrlmondata.c | 3 +-
fs/resctrl/internal.h | 2 +
fs/resctrl/rdtgroup.c | 16 +-
include/linux/resctrl.h | 13 +-
include/linux/riscv_cbqri.h | 66 +
27 files changed, 3601 insertions(+), 9 deletions(-)
---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260329-ssqosid-cbqri-rqsc-v7-0-b0c788bab48a
Best regards,
--
Drew Fustini <fustini@xxxxxxxxxx>