Re: [PATCH v7 37/49] x86/resctrl: Expand the width of dom_id by replacing mon_data_bits

From: James Morse
Date: Wed Mar 12 2025 - 14:04:58 EST


Hi Amit,

On 07/03/2025 10:17, Amit Singh Tomar wrote:
>> MPAM platforms retrieve the cache-id property from the ACPI PPTT table.
>> The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes
>> the domain-id, and is packed into the mon_data_bits union bitfield.
>> The width of cache-id in this field is 14 bits.
>>
>> Expanding the union would break 32bit x86 platforms as this union is
>> stored as the kernfs kn->priv pointer. This saved allocating memory
>> for the priv data storage.
>>
>> The firmware on MPAM platforms have used the PPTT cache-id field to
>> expose the interconnect's id for the cache, which is sparse and uses
>> more than 14 bits. Use of this id is to enable PCIe direct cache
>> injection hints. Using this feature with VFIO means the value provided
>> by the ACPI table should be exposed to user-space.
>>
>> To support cache-id values greater than 14 bits, convert the
>> mon_data_bits union to a structure. This is allocated for the default
>> control group when the kernfs event files are created, and free'd when
>> the monitor directory is rmdir'd when the domain goes offline.
>> All other control and monitor groups lookup the struct mon_data allocated
>> for the default control group, and use this.
>> This simplifies the lifecycle of this structure as the default control
>> group cannot be rmdir()d by user-space, so only needs to consider
>> domain-offline, which removes all the event files corresponding to a
>> domain while holding rdtgroup_mutex - which prevents concurrent
>> readers. mkdir_mondata_subdir_allrdtgrp() must special case the default
>> control group to ensure it is created first.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/
>> rdtgroup.c
>> index aecd3fa734cd..443635d195f0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3114,6 +3114,110 @@ static struct file_system_type rdt_fs_type = {
>>       .kill_sb        = rdt_kill_sb,
>>   };
>>   +/**
>> + * mon_get_default_kn_priv() - Get the mon_data priv data for this event from
>> + *                             the default control group.
>> + * Called when monitor event files are created for a domain.
>> + * When called with the default control group, the structure will be allocated.
>> + * This happens at mount time, before other control or monitor groups are
>> + * created.
>> + * This simplifies the lifetime management for rmdir() versus domain-offline
>> + * as the default control group lives forever, and only one group needs to be
>> + * special cased.
>> + *
>> + * @r:      The resource for the event type being created.
>> + * @d:        The domain for the event type being created.
>> + * @mevt:   The event type being created.
>> + * @rdtgrp: The rdtgroup for which the monitor file is being created,
>> + *          used to determine if this is the default control group.
>> + * @do_sum: Whether the SNC sub-numa node monitors are being created.
>> + */
>> +static struct mon_data *mon_get_default_kn_priv(struct rdt_resource *r,
>> +                        struct rdt_mon_domain *d,
>> +                        struct mon_evt *mevt,
>> +                        struct rdtgroup *rdtgrp,
>> +                        bool do_sum)
>> +{
>> +    struct kernfs_node *kn_dom, *kn_evt;
>> +    struct mon_data *priv;
>> +    bool snc_mode;
>> +    char name[32];
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    snc_mode = r->mon_scope == RESCTRL_L3_NODE;
>> +    if (!do_sum)
>> +        sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);

> This change triggered a minor report during compilation.
>
> fs/resctrl/rdtgroup.c: In function ‘mon_get_default_kn_priv’:
> fs/resctrl/rdtgroup.c:2931:28: warning: format ‘%d’ expects argument of type ‘int’, but
> argument 4 has type ‘long unsigned int’ [-Wformat=]
>  2931 |   sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci->id : d->hdr.id);
>       |                         ~~~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       | |                                 |
>       | int                               long unsigned int
>       |                         %02ld

Heh, not yet its not! You must have rebased the MPAM tree on-top, its a patch in there
that causes this:
This is because of the device-tree folk want to make cache-id an unsigned long so they can
use the arm CPU's affinity id as a cache-id. That patch already has to cleanup this
pattern elsewhere in resctrl, I need to add this one to it.

That thing is a discussion for the DT folk to drive ... I think they could just as easily
use the CPU number - only it wouldn't be a hardware-derived value. (the upshot is
cache-ids could change over a firmware update - which I think is fine)


Thanks,

James