RE: [PATCH v2 2/2] x86/resctrl: Don't workqueue local event counter reads

From: Luck, Tony
Date: Thu Nov 07 2024 - 17:15:22 EST


> I think maybe the issue you are trying to address is a user assigning a counter
> and then reading the cached data and getting cached data from a previous
> configuration? Please note that in the current implementation the cached
> data is reset directly on counter assignment [1]. If a user assigns a new
> counter and then immediately read cached data then the cached data will
> reflect the assignment even if the overflow worker thread did not get a chance
> to run since the assignment.

The issue is that AMD's ABMC implementation resets counts when reassigning
h/w counters to events in resctrl groups. If the processes reading counters is
not fully aware of h/w counter reassignment, insanity will occur.

E.g. read a counter:

$ cat mbm_local_bytes
123456789

H/w counter for this event/group assigned elsewhere.

H/w counter assigned back to this event/group

$ cat mbm_local_bytes
23456

Bandwidth calculation sees traffic amount:
(23456 - 123456789) = -123433333
Oops. Negative!

-Tony