Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()

From: James Morse
Date: Wed Oct 27 2021 - 12:50:37 EST


Hi Reinette, Babu,

On 20/10/2021 21:28, Reinette Chatre wrote:
> On 10/20/2021 12:22 PM, Babu Moger wrote:
>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>> value from the msr. The error handling is architecture specific, and
>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>
>>>>> Error handling should be handled by architecture specific code, as
>>>>> a different architecture may have different requirements. MPAM's
>>>>> counters can report that they are 'not ready', requiring a second
>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>
>>>>> Make __rmid_read() the architecture specific function for reading
>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>> handling into it.


>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>> *arg)
>>>>>          mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>>    -    if (rr.val & RMID_VAL_ERROR)
>>>>> +    if (rr.err == -EIO)
>>>>>            seq_puts(m, "Error\n");
>>>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>>>> +    else if (rr.err == -EINVAL)
>>>>>            seq_puts(m, "Unavailable\n");
>>>>>        else
>>>>>            seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>
>>>> This patch breaks the earlier fix
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&reserved=0

Aha!


>>>> When the user reads the events on the default monitoring group with
>>>> multiple subgroups, the events on all subgroups are consolidated
>>>> together. In case if the last rmid read was resulted in error then whole
>>>> group will be reported as error. The err field needs to be cleared.
>>>>
>>>> Please add this patch to clear the error.

>>> Good catch, thank you.
>>>
>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>> was taken into account by this patch and needs a bigger rework than the
>>> above fixup. For example, if I understand correctly ret_val is the error
>>> and rr->val no longer expected to contain the error after this patch. So
>>> keeping that assignment to rr->val is not correct.
>>
>> Yes. You are right. rr->val is not expected to contain the error.
>> Hopefully, this should help.

> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
> then the data still needs to be reported so the error code needs to be fixed up afterwards
> and cannot be done inside __mon_event_count(). Thank you very much.

Thanks both! I should have worked this out when splitting msr_val into two values, which
end up getting set the same.

I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
I've replaced the rr->val chunk with:
| /*
| * __mon_event_count() calls for newly created monitor groups may
| * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
| * If the first call for the control group succeed, discard any error
| * set by reads of monitor groups.
| */
| if (ret_val == 0)
| rr->err = 0;


Thanks.

James