Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()

From: Babu Moger
Date: Wed Oct 20 2021 - 15:22:11 EST


Hi Reinette,

On 10/20/21 1:15 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/19/2021 4:20 PM, Babu Moger wrote:
>> Hi James,
>>
>> On 10/1/21 11:02 AM, James Morse wrote:
>>> __rmid_read() selects the specified eventid and returns the counter
>>> value from the msr. The error handling is architecture specific, and
>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>
>>> Error handling should be handled by architecture specific code, as
>>> a different architecture may have different requirements. MPAM's
>>> counters can report that they are 'not ready', requiring a second
>>> read after a short delay. This should be hidden from resctrl.
>>>
>>> Make __rmid_read() the architecture specific function for reading
>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>> handling into it.
>>>
>>> Signed-off-by: James Morse <james.morse@xxxxxxx>
>>> ---
>>> Changes since v1:
>>>   * Return EINVAL from the impossible case in __mon_event_count() instead
>>>     of an x86 hardware specific value.
>>> ---
>>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>>>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>>>   arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>>>   include/linux/resctrl.h                   |  1 +
>>>   4 files changed, 31 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> index 25baacd331e0..c8ca7184c6d9 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>> *arg)
>>>         mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>   -    if (rr.val & RMID_VAL_ERROR)
>>> +    if (rr.err == -EIO)
>>>           seq_puts(m, "Error\n");
>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>> +    else if (rr.err == -EINVAL)
>>>           seq_puts(m, "Unavailable\n");
>>>       else
>>>           seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>
>> This patch breaks the earlier fix
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&amp;reserved=0
>>
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all subgroups are consolidated
>> together. In case if the last rmid read was resulted in error then whole
>> group will be reported as error. The err field needs to be cleared.
>>
>> Please add this patch to clear the error.
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 14bc843043da..0e4addf237ec 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>>          /* Report error if none of rmid_reads are successful */
>>          if (ret_val)
>>                  rr->val = ret_val;
>> +       else
>> +               rr->err  = 0;
>>   }
>>
>>   /*
>>
>
> Good catch, thank you.
>
> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
> was taken into account by this patch and needs a bigger rework than the
> above fixup. For example, if I understand correctly ret_val is the error
> and rr->val no longer expected to contain the error after this patch. So
> keeping that assignment to rr->val is not correct.

Yes. You are right. rr->val is not expected to contain the error.
Hopefully, this should help.

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..105d972cc511 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -441,9 +441,9 @@ void mon_event_count(void *info)
}
}

- /* Report error if none of rmid_reads are successful */
- if (ret_val)
- rr->val = ret_val;
+ /* Clear the error if at least one of the rmid reads succeed */
+ if (ret_val == 0)
+ rr->err = 0;
}

/*