Re: [PATCH v7 37/49] x86/resctrl: Expand the width of dom_id by replacing mon_data_bits
From: James Morse
Date: Wed Mar 12 2025 - 14:04:58 EST
Hi Amit,
On 07/03/2025 10:17, Amit Singh Tomar wrote:
>> MPAM platforms retrieve the cache-id property from the ACPI PPTT table.
>> The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes
>> the domain-id, and is packed into the mon_data_bits union bitfield.
>> The width of cache-id in this field is 14 bits.
>>
>> Expanding the union would break 32bit x86 platforms as this union is
>> stored as the kernfs kn->priv pointer. This saved allocating memory
>> for the priv data storage.
>>
>> The firmware on MPAM platforms have used the PPTT cache-id field to
>> expose the interconnect's id for the cache, which is sparse and uses
>> more than 14 bits. Use of this id is to enable PCIe direct cache
>> injection hints. Using this feature with VFIO means the value provided
>> by the ACPI table should be exposed to user-space.
>>
>> To support cache-id values greater than 14 bits, convert the
>> mon_data_bits union to a structure. This is allocated for the default
>> control group when the kernfs event files are created, and free'd when
>> the monitor directory is rmdir'd when the domain goes offline.
>> All other control and monitor groups lookup the struct mon_data allocated
>> for the default control group, and use this.
>> This simplifies the lifecycle of this structure as the default control
>> group cannot be rmdir()d by user-space, so only needs to consider
>> domain-offline, which removes all the event files corresponding to a
>> domain while holding rdtgroup_mutex - which prevents concurrent
>> readers. mkdir_mondata_subdir_allrdtgrp() must special case the default
>> control group to ensure it is created first.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/
>> rdtgroup.c
>> index aecd3fa734cd..443635d195f0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3114,6 +3114,110 @@ static struct file_system_type rdt_fs_type = {
>> .kill_sb = rdt_kill_sb,
>> };
>> +/**
>> + * mon_get_default_kn_priv() - Get the mon_data priv data for this event from
>> + * the default control group.
>> + * Called when monitor event files are created for a domain.
>> + * When called with the default control group, the structure will be allocated.
>> + * This happens at mount time, before other control or monitor groups are
>> + * created.
>> + * This simplifies the lifetime management for rmdir() versus domain-offline
>> + * as the default control group lives forever, and only one group needs to be
>> + * special cased.
>> + *
>> + * @r: The resource for the event type being created.
>> + * @d: The domain for the event type being created.
>> + * @mevt: The event type being created.
>> + * @rdtgrp: The rdtgroup for which the monitor file is being created,
>> + * used to determine if this is the default control group.
>> + * @do_sum: Whether the SNC sub-numa node monitors are being created.
>> + */
>> +static struct mon_data *mon_get_default_kn_priv(struct rdt_resource *r,
>> + struct rdt_mon_domain *d,
>> + struct mon_evt *mevt,
>> + struct rdtgroup *rdtgrp,
>> + bool do_sum)
>> +{
>> + struct kernfs_node *kn_dom, *kn_evt;
>> + struct mon_data *priv;
>> + bool snc_mode;
>> + char name[32];
>> +
>> + lockdep_assert_held(&rdtgroup_mutex);
>> +
>> + snc_mode = r->mon_scope == RESCTRL_L3_NODE;
>> + if (!do_sum)
>> + sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
> This change triggered a minor report during compilation.
>
> fs/resctrl/rdtgroup.c: In function ‘mon_get_default_kn_priv’:
> fs/resctrl/rdtgroup.c:2931:28: warning: format ‘%d’ expects argument of type ‘int’, but
> argument 4 has type ‘long unsigned int’ [-Wformat=]
> 2931 | sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
> | ~~~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> | | |
> | int long unsigned int
> | %02ld
Heh, not yet its not! You must have rebased the MPAM tree on-top, its a patch in there
that causes this:
This is because of the device-tree folk want to make cache-id an unsigned long so they can
use the arm CPU's affinity id as a cache-id. That patch already has to cleanup this
pattern elsewhere in resctrl, I need to add this one to it.
That thing is a discussion for the DT folk to drive ... I think they could just as easily
use the CPU number - only it wouldn't be a hardware-derived value. (the upshot is
cache-ids could change over a firmware update - which I think is fine)
Thanks,
James