Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups

From: Moger, Babu
Date: Thu Feb 20 2025 - 16:31:21 EST


Hi Dave,

On 2/20/25 09:44, Dave Martin wrote:
> Hi,
>
> On Wed, Feb 19, 2025 at 03:09:51PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/25 07:53, Dave Martin wrote:
>>> On Wed, Jan 22, 2025 at 02:20:30PM -0600, Babu Moger wrote:
>>>> Provide the interface to list the assignment states of all the resctrl
>>>> groups in mbm_cntr_assign mode.
>
> [...]
>
>>>> +static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>> + struct seq_file *s, void *v)
>>>> +{
>
> [...]
>
>>> Unlike the other resctrl files, it looks like the total size of this
>>> data will scale up with the number of existing monitoring groups
>>> and the lengths of the group names (in addition to the number of
>>> monitoring domains).
>>>
>>> So, this can easily be more than a page, overflowing internal limits
>>> in the seq_file and kernfs code.
>>>
>>> Do we need to track some state between read() calls? This can be done
>>> by overriding the kernfs .open() and .release() methods and hanging
>>> some state data (or an rdtgroup_file pointer) on of->priv.
>>>
>>> Also, if we allow the data to be read out in chunks, then we would
>>> either have to snapshot all the data in one go and stash the unread
>>> tail in the kernel, or we would need to move over to RCU-based
>>> enumeration or similar -- otherwise releasing rdtgroup_mutex in the
>>> middle of the enumeration in order to return data to userspace is going
>>> to be a problem...
>>
>> Good catch.
>>
>> I see similar buffer overflow is handled by calling seq_buf_clear()
>> (look at process_durations() or in show_user_instructions()).
>>
>> How about handling this by calling rdt_last_cmd_clear() before printing
>> each group?
>
> Does this work?
>
> Once seq_buf_has_overflowed() returns nonzero, data has been lost, no?
> So far as I can see, show_user_instructions() just gives up on printing
> the affected line, while process_durations() tries to anticipate
> overflow and prints out the accumulated text to dmesg before clearing
> the buffer.

Yea. Agree,

>
> In our case, we cannot send more data to userspace than was requested
> in the read() call, so we might have nowhere to drain the seq_buf
> contents to in order to free up space.
>
> sysfs "expects" userspace to do a big enough read() that this problem
> doesn't happen. In practice this is OK because people usually read
> through a buffered I/O layer like stdio, and in realistic
> implementations the user-side I/O buffer is large enough to hide this
> issue.
>
> But mbm_assign_control data is dynamically generated and potentially
> much bigger than a typical sysfs file.

I have no idea how to handle this case. We may have to live with this
problem. Let us know if there are any ideas.

>
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 484d6009869f..1828f59eb723 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1026,6 +1026,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>> }
>>
>> list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> + rdt_last_cmd_clear();
>> seq_printf(s, "%s//", rdtg->kn->name);
>>
>> sep = false;
>> @@ -1041,6 +1042,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>> seq_putc(s, '\n');
>>
>> list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> mon.crdtgrp_list) {
>> + rdt_last_cmd_clear();
>
> I don't see how this helps.
>
> Surely last_cmd_status has nothing to do with s?

Correct. Clearly, I misunderstood the problem.

>
> [...]
>
> Cheers
> ---Dave
>

--
Thanks
Babu Moger