Re: [PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask

From: Thomas Gleixner
Date: Fri May 17 2024 - 21:17:39 EST

Next message: Namhyung Kim: "Re: [PATCH v1 0/3] Use BPF filters for a "perf top -u" workaround"
Previous message: kernel test robot: "Re: [PATCH v3 37/40] fsi: core: Add different types of CFAM"
In reply to: Costa Shulyupin: "[PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask"
Next in thread: Thomas Gleixner: "Re: [PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, May 16 2024 at 22:04, Costa Shulyupin wrote:
> irq_affinity_adjust() is prototyped from irq_affinity_online_cpu()
> and irq_restore_affinity_of_irq().

I'm used to this prototyped phrase by now. It still does not justify to
expose me to this POC hackery.

My previous comments about change logs still apply.

> +static int irq_affinity_adjust(cpumask_var_t disable_mask)
> +{
> + unsigned int irq;
> + cpumask_var_t mask;
> +
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + irq_lock_sparse();
> + for_each_active_irq(irq) {
> + struct irq_desc *desc = irq_to_desc(irq);
> +
> + raw_spin_lock_irq(&desc->lock);

That's simply broken. This is not CPU hotplug on an outgoing CPU. Why
are you assuming that your isolation change code can rely on the
implicit guarantees of CPU hot(un)plug?

Also there is a reason why interrupt related code is in kernel/irq/* and
not in some random other location. Even if C allows you to fiddle with
everything that does not mean that hiding random hacks in other places
is correct in any way.

> + struct irq_data *data = irq_desc_get_irq_data(desc);
> +
> + if (irqd_affinity_is_managed(data) && cpumask_weight_and(disable_mask,
> + irq_data_get_affinity_mask(data))) {

Interrupt target isolation is only relevant for managed interrupts and
non-managed interrupts clearly are going to migrate themself away
magically, right?

> +
> + cpumask_and(mask, cpu_online_mask, irq_default_affinity);
> + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ));

There are clearly a lot of comments explaining what this is doing and
why it is correct as there is a guarantee that these masks overlap by
definition.

> + irq_set_affinity_locked(data, mask, true);

Plus the extensive explanation why using 'force=true' is even remotely
correct here.

I conceed that the documentation of that function and its arguments is
close to non-existant, but if you follow the call chain of that function
there are enough hints down the road, no?

> + WARN_ON(cpumask_weight_and(irq_data_get_effective_affinity_mask(data),
> + disable_mask));
> + WARN_ON(!cpumask_subset(irq_data_get_effective_affinity_mask(data),
> + cpu_online_mask));
> + WARN_ON(!cpumask_subset(irq_data_get_effective_affinity_mask(data),
> + housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)));

These warnings are required and useful within the spinlock held and
interrupt disabled section because of what?

- Because the resulting stack trace provides a well known call chain

- Because the resulting warnings do not tell anything about the
affected interrupt number

- Because the resulting warnings do not tell anything about the CPU
masks which cause the problem

- Because the aggregate information of the above is utterly useless

Impressive...

Thanks,

tglx

Next message: Namhyung Kim: "Re: [PATCH v1 0/3] Use BPF filters for a "perf top -u" workaround"
Previous message: kernel test robot: "Re: [PATCH v3 37/40] fsi: core: Add different types of CFAM"
In reply to: Costa Shulyupin: "[PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask"
Next in thread: Thomas Gleixner: "Re: [PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]