Re: [PATCH v2 08/18] x86/resctrl: Queue mon_event_read() instead of sending an IPI
From: James Morse
Date: Wed Mar 08 2023 - 11:10:51 EST
Hi Reinette,
On 06/03/2023 11:33, James Morse wrote:
> On 02/02/2023 23:47, Reinette Chatre wrote:
>> On 1/13/2023 9:54 AM, James Morse wrote:
>>> x86 is blessed with an abundance of monitors, one per RMID, that can be
>>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>>> the number implemented is up to the manufacturer. This means when there are
>>> fewer monitors than needed, they need to be allocated and freed.
>>>
>>> Worse, the domain may be broken up into slices, and the MMIO accesses
>>> for each slice may need performing from different CPUs.
>>>
>>> These two details mean MPAMs monitor code needs to be able to sleep, and
>>> IPI another CPU in the domain to read from a resource that has been sliced.
>>>
>>> mon_event_read() already invokes mon_event_count() via IPI, which means
>>> this isn't possible.
>>>
>>> Change mon_event_read() to schedule mon_event_count() on a remote CPU and
>>> wait, instead of sending an IPI. This function is only used in response to
>>> a user-space filesystem request (not the timing sensitive overflow code).
>>>
>>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>>> the monitor-allocation in monitor.c.
>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> index 1df0e3262bca..4ee3da6dced7 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> @@ -542,7 +545,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>> rr->val = 0;
>>> rr->first = first;
>>>
>>> - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>>> + smp_call_on_cpu(cpumask_any(&d->cpu_mask), mon_event_count, rr, false);
>
>> This would be problematic for the use cases where single tasks are run on
>> adaptive-tick CPUs. If an adaptive-tick CPU is chosen to run the function then
>> it may never run. Real-time environments are target usage of resctrl (with examples
>> in the documentation).
>
> Interesting. I can't find an IPI wakeup under smp_call_on_cpu() ... I wonder what else
> this breaks!
>
> Resctrl doesn't consider the nohz-cpus when doing any of this work, or when setting up the
> limbo or overflow timer work.
>
> I think the right thing to do here is add some cpumask_any_housekeeping() helper to avoid
> nohz-full CPUs where possible, and fall back to an IPI if all the CPUs in a domain are
> nohz-full.
>
> Ideally cpumask_any() would do this but it isn't possible without allocating memory.
> If I can reproduce this problem, ...
... I haven't been able to reproduce this.
With "nohz_full=1 isolcpus=nohz,domain,1" on the command-line I can still
smp_call_on_cpu() on cpu-1 even when its running a SCHED_FIFO task that spins in
user-space as much as possible.
This looks to be down to "sched: RT throttling activated", which seems to be to prevent RT
CPU hogs from blocking kernel work. From Peter's comments at [0], it looks like running
tasks 100% in user-space isn't a realistic use-case.
Given that, I think resctrl should use smp_call_on_cpu() to avoid interrupting a nohz_full
CPUs, and the limbo/overflow code should equally avoid these CPUs. If work does get
scheduled on those CPUs, it is expected to run eventually.
Thanks,
James
[0] https://lore.kernel.org/all/20130823110254.GU31370@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/