Re: [PATCH] x86/resctrl: Fix event counts regression in reused RMIDs

From: Reinette Chatre
Date: Thu Dec 08 2022 - 13:33:01 EST


Hi Peter,

On 12/8/2022 2:04 AM, Peter Newman wrote:
> Hi Reinette,
>
> On Wed, Dec 7, 2022 at 8:48 PM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>>
>> To get back to the original behavior before the refactoring it also seems
>> that __mon_event_count() needs to return right after calling
>> resctrl_arch_reset_rmid(). The only caller with rr->first set is when
>> the mon directory is created and the returned values are not used,
>> it is just run to get prev_msr set. This also avoids unnecessarily reading
>> the counters twice.
>>
>> So, how about:
>>
>> static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>> {
>>
>> ...
>> if (rr->first) {
>> resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
>> return 0;
>> }
>> ...
>>
>> }
>
> Avoiding the double-read sounds good, but...
>
>>
>> Also ... there appears to be a leftover related snippet in __mon_event_count()
>> that does not belong anymore and may still cause incorrect behavior:
>>
>> static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>> {
>> ...
>> if (rr->first) {
>> memset(m, 0, sizeof(struct mbm_state));
>> return 0;
>> }
>> ...
>> }
>
> I'm less sure about removing (or skipping) this. mbm_state::mbm_local
> still seems to be used by the mba_sc code. That might be why James
> left this code in.
>
> I was sort of confused about the new role of mbm_state following the
> refactoring when reviewing Babu's change. (which reminds me that I
> should have CC'ed him on this patch)


I think this can be cleaned up to make the code more clear. Notice the
duplication of following snippet in __mon_event_count():
rr->val += tval;
return 0;

I do not see any need to check the event id before doing the above. That
leaves the bulk of the switch just needed for the rr->first handling that
can be moved to resctrl_arch_reset_rmid().

Something like:

void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, ...
{
...
struct arch_mbm_state *am;
struct mbm_state *m;
u64 val = 0;
int ret;

m = get_mbm_state(d, rmid, eventid); /* get_mbm_state() to be created */
if (m)
memset(m, 0, sizeof(*m));

am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am) {
memset(am, 0, sizeof(*am));
/* Record any initial, non-zero count value. */
ret = __rmid_read(rmid, eventid, &val);
if (!ret)
am->prev_msr = val;
}

}

Having this would be helpful as reference to Babu's usage.

Also please note that I changed the __rmid_read(). There is no need
to require each __rmid_read() caller to test MSR bits for validity, that
can be contained within __rmid_read().

Something like below remains:

static int __mon_event_count(u32 rmid, struct rmid_read *rr)
{

...

if (rr->first) {
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
return 0;
}

rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
if (rr->err)
return rr->err;

rr->val += tval;
return 0;

}

What do you think?

Reinette