Re: [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu

From: James Morse
Date: Wed Oct 25 2023 - 13:57:31 EST


Hi Reinette,

On 03/10/2023 22:22, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index c54fa86e4ef9..bd7f60bf49fe 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -60,11 +60,15 @@
>> * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>> * aren't marked nohz_full
>> * @mask: The mask to pick a CPU from.
>> + * @exclude_cpu:The CPU to avoid picking.
>> *
>> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
>> - * nohz_full, these are preferred.
>> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
>> + * CPUs that don't use nohz_full, these are preferred. Pass
>> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
>> + * Returns >= nr_cpu_ids if no CPUs are available.

> It may be helpful to add that the function can only fail if exclude_cpu is
> *not* RESCTRL_PICK_ANY_CPU. That helps to understand the sparse error checking.

Assuming you don't give it an empty mask, that should be true ... but I've missed a
difference between the two helpers use of cpumask_any() when combining them....

It now looks like this:
|/**
| * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
| * aren't marked nohz_full
| * @mask: The mask to pick a CPU from.
| * @exclude_cpu:The CPU to avoid picking.
| *
| * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
| * CPUs that don't use nohz_full, these are preferred. Pass
| * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
| *
| * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
| */
|static inline unsigned int
|cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
|{
| unsigned int cpu, hk_cpu;
|
| if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
| cpu = cpumask_any(mask);
| else
| cpu = cpumask_any_but(mask, exclude_cpu);
|
| /* If the CPU picked isn't marked nohz_full, we're done */
| if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
| return cpu;
|
| /* Try to find a CPU that isn't nohz_full to use in preference */
| hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
| if (hk_cpu == exclude_cpu)
| hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
|
| if (hk_cpu < nr_cpu_ids)
| cpu = hk_cpu;
|
| return cpu;
|}

This also has to check cpu is in range before passing it to tick_nohz_full_cpu().


>> @@ -73,6 +77,9 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>> return cpu;
>>
>
> It is not obvious from this hunk but I cannot see how this would work
> on a system without any nohz_full CPUs.
>
> At this point the function looks like:
>
> cpu = cpumask_any(mask);
> if (!tick_nohz_full_cpu(cpu))
> return cpu;
>
> I expected exclude_cpu to be taken into account. If I understand correctly
> exclude_cpu can be picked by cpumask_any() and as long as it is not
> a nohz_full CPU it would be returned.

Yup, I missed this when combining the functions, fixed as above...



>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 9c6d4b0970e2..208e46ba7368 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -807,22 +808,31 @@ void cqm_handle_limbo(struct work_struct *work)
>> __check_limbo(d, false);
>>
>> if (has_busy_rmid(d)) {
>> - cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> + cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>> schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
>> }
>>
>> mutex_unlock(&rdtgroup_mutex);
>> }
>>
>> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
>> +/**
>> + * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
>> + * domain.
>> + * @delay_ms: How far in the future the handler should run.
>> + * @exclude_cpu: Which CPU the handler should not run on,
>> + * RESCTRL_PICK_ANY_CPU to pick any CPU.
>> + */

> arch/x86/kernel/cpu/resctrl/monitor.c:824: info: Scanning doc for function cqm_setup_limbo_handler
> arch/x86/kernel/cpu/resctrl/monitor.c:832: warning: Function parameter or member 'dom' not described in 'cqm_setup_limbo_handler'

What tool outputs this? I've run 'make ARCH=x86 htmldocs', which outputs a tonne of stuff,
but I've never found lines about resctrl in there:
| morse@eglon:~/kernel/mpam/build_x86_64$ make ARCH=x86 htmldocs &> tee output.log
| morse@eglon:~/kernel/mpam/build_x86_64$ cat output.log | grep resctrl
| morse@eglon:~/kernel/mpam/build_x86_64$


Thanks,

James