Re: [PATCH v5] x86/resctrl: Add event choices for mba_MBps

From: Reinette Chatre
Date: Mon Dec 04 2023 - 17:15:23 EST


Hi Tony,

On 12/4/2023 1:08 PM, Tony Luck wrote:
> On Mon, Dec 04, 2023 at 12:03:23PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 12/4/2023 11:45 AM, Luck, Tony wrote:
>>>> Yes. I saw the thread. Even then I feel having two similar options can
>>>> cause confusion. I feel it is enough just to solve the original problem.
>>>> Giving more options to a corner cases is a overkill in my opinion.
>>>
>>> The "original" problem was systems without "local" bandwidth event. I
>>> wanted to give a way for users of mba_MBps to still have some way to
>>> use it (assuming that "total" bandwidth event was present).
>>>
>>> Reinette suggested that some people might want to use "total", even
>>> on systems that support "local". I firmly agree with that. It is easy to
>>> construct scenarios where most bandwidth is to a remote node. using
>>> "local" event will do nothing to throttle in these case. I'm not at all sure
>>> why "local" event was picked. There's nothing in the LKML threads to
>>> give clues.
>>>
>>> I proposed a mount option "total" as a modifier to be used in conjunction
>>> with "mba_MBps". Reinette said it was too generic. Her suggestion was
>>> to add "mba_MBps_total" to be used instead of "mba_MBps".
>>
>> No, it cannot be used instead of "mba_MBps". My intention was for it to be
>> in addition to existing "mba_MBps" since taking "mba_MBps" away would be
>> considered breaking user space ABI.
>
> I was unclear. The mba_MBps option must remain as legacy ABI. My
> "instead of" was intended to convey that a user wanting total bandwidth
> would use:
>
> # mount -t resctrl -o mba_MBps_total resctrl /sys/fs/resctrl
>
> rather than the new option being a modifier and requiring both
> the legacy option and the modifier like this:
>
> # mount -t resctrl -o mba_MBps,mba_MBps_total resctrl /sys/fs/resctrl
>
> which seems overly verbose.
>

I misunderstood your comment. Thank you for clarifying.

>>
>> Even so ...
>>
>>>
>>> Is that where I should have gone, instead of "mba_MBps={local|total}"?
>>
>> While I did propose "mba_MBps_total" (in addition to "mba_MBps") I do
>> recognize your comment that a new key of mba_MBps_event does give more
>> flexibility if different events become available in future. Emphasis is
>> on "different" since I do not believe the parsing can support multiple
>> events and thus mba_MBps_event cannot be treated as a general bucket for all
>> mba_sc options, just different events guiding the feedback loop.
>>
>> "mba_MBps" must be kept and having it continue to use local bw as default,
>> but total bw on systems that do not support local bw seems appropriate,
>> (which is what this patch does).
>
> So we defintely have:
>
> "mba_MBps" - defaults to local, on systems without local may switch to
> total if that is available. Should this switch get a pr_info()? Or just happen
> silently (as I've done in patches so far).

I do not think a message is required ... I cannot see how it provides information
that user space does not already know. It surely does no harm and I would not
object if it is added. Even so, I do not think a kernel message should be what
is relied on to share with user space what guides the feedback loop. To that end
I think the mount options combined with the system capabilities (learned via
resctrl) provide that information.

Please do note that if the solution of this version is maintained then
rdtgroup_show_options() needs to be updated. With that done, user space should
be able to determine at any time during runtime (no matter if kernel log has been
cleared) which event is being used.

> and need to come to agreement on which of these to implement:
>
> A) "mba_MBps_total" - forces use of total. Fails the mount if total is not
> available.

The "cons" of this is that (a) user is not able to prevent the failover,
and (b) harder to support future events (which are unknown and difficult
to prepare for anyway).

>
> B) "mba_MBps={local|total)" forces use of chosen event, fails if event
> is unavailable.

I assume you mean "mba_MBps_event={local|total}". This is my preference but
I would like to learn more about Babu's "overkill" argument. I do believe that
(B) is well motivated in [1] and has no impact on AMD.
My uncertainty here (considering one motivation is to future proof it against
adding more events) is the generic "local" and "total" names. Even so, all
I could come up with are "local_bw" and "total_bw", which I do not think are
improvements.

>
> C) Something else.
>
> D) Don't provide any way to force use of total event.

(D) is what would be needed at minimum.

In support of Babu's comment I can see how having mba_MBps as well as
mba_MBps_event can cause confusion. To help with this the documentation could
be expanded with more detailed content and also examples.

Reinette

[1] https://lore.kernel.org/lkml/SJ1PR11MB60839F92B1C15A659CD32478FC86A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/