RE: [PATCH v2 2/2] x86/resctrl: Don't workqueue local event counter reads
From: Luck, Tony
Date: Thu Nov 07 2024 - 18:32:11 EST
> > E.g. read a counter:
> >
> > $ cat mbm_local_bytes
> > 123456789
> >
> > H/w counter for this event/group assigned elsewhere.
> >
> > H/w counter assigned back to this event/group
> >
> > $ cat mbm_local_bytes
> > 23456
> >
> > Bandwidth calculation sees traffic amount:
> > (23456 - 123456789) = -123433333
> > Oops. Negative!
>
> As I understand this is already an issue today on AMD systems without assignable counters
> that may run out of counters. On these systems, any RMID that is no longer being tracked will
> be reset to zero. [1]
My understanding too.
> The support for assignable counters give user space control over this unexpected reset of
> counters.
>
> The scenario you present seem to demonstrate how two independent user space systems
> can trample on each other when interacting with the same resources. Is this something you expect
> resctrl should protect against? I would expect that there would be a single user space system
> doing something like above and it would reset history after unassigning a counter.
As we are discussing adding a new interface, I thought it worth considering adding
a way for user space to be aware of the re-assignment of counters. IMHO it would be
a nice to have feature. Not required if all users of resctrl are aware of each other's
actions.
> This does indeed highlight that if resctrl does start to dynamically assign counters (which
> has only been speculated in this thread and is not part of current [1] design) then it may cause
> problems on user space side.
Agreed. Dynamic assignment would break "the user knows what is happening" assumption.
Seems like a bad idea.
> Reinette
>
> [1] https://lore.kernel.org/all/cover.1730244116.git.babu.moger@xxxxxxx/