Re: resctrl2 - status
From: Reinette Chatre
Date: Mon Aug 28 2023 - 10:51:05 EST
On 8/25/2023 6:11 PM, Tony Luck wrote:
> On Fri, Aug 25, 2023 at 04:08:21PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 8/25/2023 1:54 PM, Tony Luck wrote:
>>> On Fri, Aug 25, 2023 at 01:20:22PM -0700, Reinette Chatre wrote:
>>>> On 8/25/2023 12:44 PM, Luck, Tony wrote:
>>>>>>>> Alternatively, can user space just take a "load all resctrl modules
>>>>>>>> and see what sticks" (even modules of different architectures since
>>>>>>>> a user space may want to be generic) approach?
>>>>>>>
>>>>>>> This mostly works. Except for the cases where different modules access
>>>>>>> the same underlying hardware, so can't be loaded together.
>>>>>>>
>>>>>>> Examples:
>>>>>>>
>>>>>>> rdt_l3_cat vs. rdt_l3_cdp - user needs to decide whether they want CDP or not.
>>>>>>> But this is already true ... they have to decide whether to pass the "-o cdp" option
>>>>>>> to mount.
>>>>>>>
>>>>>>> rdt_l3_mba vs. rdt_l3_mba_MBps - does the user want to control memory bandwidth
>>>>>>> with percentages, or with MB/sec values. Again the user already has to make this
>>>>>>> decision when choosing mount options.
>>>>>>>
>>>>>>>
>>>>>>> Maybe the "What resctrl options does this machine support?" question would be
>>>>>>> best answered with a small utility?
>>>>>>
>>>>>> A user space utility or a kernel provided utility? If it is a user space utility
>>>>>> I think it would end up needing to duplicate what the kernel is required to do
>>>>>> to know if a particular feature is supported. It seems appropriate that this
>>>>>> could be a kernel utility that can share this existing information with user
>>>>>> space. resctrl already supports the interface for this via /sys/fs/resctrl/info.
>>>>>
>>>>> I was imagining a user space utility. Even though /proc/cpuinfo doesn't show
>>>>> all features, a utility has access to all the CPUID leaves that contain the
>>>>> details of each feature enumeration.
>>>>
>>>> For x86 that may work (in some scenarios, see later) for now but as I understand
>>>> Arm would need a different solution where I believe the information is obtained
>>>> via ACPI. I think it is unnecessary to require user space to have parsers for
>>>> CPUID and ACPI if that same information needs to be parsed by the kernel and
>>>> there already exists an interface with which the information is communicated
>>>> from kernel to user space. Also, just because information CPUID shows a feature
>>>> is supported by the hardware does not mean that the kernel has support for that
>>>> feature. This could be because of a feature mismatch between user space and
>>>> kernel, or even some features disabled for use via the, for example "rdt=!l3cat",
>>>> kernel parameter.
>>>
>>> Agreed this is complex, and my initial resctrl2 proposal lacks
>>> functionality in this area.
>>
>> Why is there a need to reinvent these parts?
>
> Perhaps there isn't ... see below.
>
>>
>>>>>> fyi ... as with previous attempts to discuss this work I find it difficult
>>>>>> to discuss this work when you are selective about what you want to discuss/answer
>>>>>> and just wipe the rest. Through this I understand that I am not your target
>>>>>> audience.
>>>>>
>>>>> Not my intent. I value your input highly. I'm maybe too avid a follower of the
>>>>> "trim your replies" school of e-mail etiquette. I thought I'd covered the gist
>>>>> of your message.
>>>>>
>>>>> I'll try to be more thorough in responding in the future.
>>>>
>>>> Two items from my previous email remain open:
>>>>
>>>> First, why does making the code modular require everything to be loadable
>>>> modules?
>>>> I think that it is great that the code is modular. Ideally it will help to
>>>> support the other architectures. As you explain this modular design also
>>>> has the benefit that "modules" can be loaded and unloaded after resctrl mount.
>>>> Considering your example of MBA and MBA_MBps support ... if I understand
>>>> correctly with code being modular it enables changes from one to the other
>>>> after resctrl mount. User can start with MBA and then switch to MBA_MBps
>>>> without needing to unmount resctrl. What I do not understand is why does
>>>> the code being modular require everything to be modules? Why, for example,
>>>> could a user not interact with a resctrl file that enables the user to make
>>>> this switch from, for example, MBA to MBA_MBps? With this the existing
>>>> interfaces can remain to be respected, the existing mount parameters need
>>>> to remain anyway, while enabling future "more modular" usages.
>>>
>>> Lots of advantages to modules:
>>> 1) Only load what you need.
>>> - saves memory
>>> - reduces potential attack surface
>>> - may avoid periodic timers (e.g. for MBM overflow and
>>> for LLC occupancy "limbo" mode).
>>> 2) If there is a security fix, can be deployed without a reboot.
>>> 3) Isolation between different features.
>>> - Makes development and testing simpler
>>>
>>
>> From what I understand (1) and (3) are accomplished through things
>> being modular. To transition smoothly it may be required for all
>> currently supported features to be loaded by default, with the
>> option to unload afterwards by user space that understands new
>> modular interfaces.
>>
>> (2) does not need a module for each resource and feature supported
>> by resctrl. A single resctrl module would accomplish this and I
>> would expect it to be something everybody would like. James also
>> mentioned it being on his significant to-do list.
>>
>>> Sure some things like switching MBA to MBA_MBps mode by writing to
>>> a control file are theoretically possible. But they would be far more
>>> complex implementations with many possible oppurtunities for bugs.
>>> I think Vikas made a good choice to make this a mount option rather
>>> than selectable at run time.
>>>
>>>> Second, copied from my previous email, what is the plan to deal with current
>>>> users that just mount resctrl and expect to learn from it what features are
>>>> supported?
>>>
>>> Do such users exist? Resctrl is a sophisticated system management tool.
>>> I'd expect system administrators deploying it are well aware of the
>>> capabilities of the different types of systems in their data center.
>>>
>>> But if I'm wrong, then I have to go back to figure out a way to
>>> expose this information in a better way than randomly running "modprobe"
>>> to see what sticks.
>>
>> I always start with intel-cmt-cat but I believe that the burden would be
>> on you to convince all that existing user space would not be impacted
>> by this change. If I understand correctly this implementation would
>> result in mounting resctrl to have an empty schemata and no resources
>> in the info directory. Without knowledge about how to enable resources
>> the user space could be expected to interpret that as no resources enabled
>> on the system.
>
> Reinette,
>
> The basic issue is that my module based system has become less user
> friendly. Requiring extra steps to get basic things works.
>
> Luckily there is a simple solution. Make the modules for the basic
> functions autoload. E.g. for MBA:
>
> +static const struct x86_cpu_id mba_feature[] = {
> + X86_MATCH_FEATURE(X86_FEATURE_MBA, 0),
> + { }
> +};
> +MODULE_DEVICE_TABLE(x86cpu, mba_feature);
>
> Then immediately after booting the system looks like this:
>
> $ lsmod | grep rdt
> rdt_l3_mba 16384 0
> rdt_mbm_local_bytes 12288 0
> rdt_mbm_total_bytes 12288 0
> rdt_llc_occupancy 12288 0
> rdt_l3_cat 16384 0
>
> And mounting resctrl:
>
> $ sudo mount -t resctrl resctrl /sys/fs/resctrl
> $ tree /sys/fs/resctrl/info
> /sys/fs/resctrl/info
> ├── L3
> │ ├── bit_usage
> │ ├── cbm_mask
> │ ├── min_cbm_bits
> │ ├── num_closids
> │ └── shareable_bits
> ├── L3_MON
> │ ├── max_threshold_occupancy
> │ ├── mbm_poll_threshold
> │ ├── mon_features
> │ └── num_rmids
> ├── last_cmd_status
> └── MB
> ├── bandwidth_gran
> ├── delay_linear
> ├── min_bandwidth
> └── num_closids
>
> 3 directories, 14 files
> $ cat /sys/fs/resctrl/schemata
> MB: 0=0;1=0
> L3: 0=fff;1=fff
>
> Thanks for pushing me to search for this solution to make things
> more compatible.
heh ... sounds familiar to "To transition smoothly it may be required
for all currently supported features to be loaded by default". It is not
obvious to me how this also closes the other opens.
Reinette