Re: "Plug non-maskable MSI affinity race" triggers a warning with CPU hotplugs

From: Qian Cai
Date: Wed Feb 12 2020 - 19:44:48 EST




> On Feb 12, 2020, at 6:19 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Qian,
>
> Qian Cai <cai@xxxxxx> writes:
>
>> The linux-next commit 6f1a4891a592 (âx86/apic/msi: Plug non-maskable
>> MSI affinity raceâ) Introduced a bug which is always triggered during
>> the CPU hotplugs on this server. See the trace and line numbers below.
>
> Thanks for the report.
>
>> WARNING: CPU: 0 PID: 2794 at arch/x86/kernel/apic/msi.c:103 msi_set_affinity+0x278/0x330
>> CPU: 0 PID: 2794 Comm: irqbalance Tainted: G L 5.6.0-rc1-next-20200211 #1
>> irq_do_set_affinity at kernel/irq/manage.c:259
>> irq_setup_affinity at kernel/irq/manage.c:474
>> irq_select_affinity_usr at kernel/irq/manage.c:496
>> write_irq_affinity.isra.0+0x137/0x1e0
>> irq_affinity_proc_write+0x19/0x20
> ...
>
> I'm glad I added this WARN_ON(). This unearthed another long standing
> bug. If user space writes a bogus affinity mask, i.e. no online CPUs
> then it calls irq_select_affinity_usr().
>
> This was introduced for ALPHA in
>
> eee45269b0f5 ("[PATCH] Alpha: convert to generic irq framework (generic part)")
>
> and subsequently made available for all architectures in
>
> 18404756765c ("genirq: Expose default irq affinity mask (take 3)")
>
> which already introduced the circumvention of the affinity setting
> restrictions for interrupts which cannot be moved in process context.
>
> The whole exercise is bogus in various aspects:
>
> 1) If the interrupt is already started up then there is absolutely
> no point to honour a bogus interrupt affinity setting from user
> space. The interrupt is already assigned to an online CPU and it
> does not make any sense to reassign it to some other randomly
> chosen online CPU.
>
> 2) If the interupt is not yet started up then there is no point
> either. A subsequent startup of the interrupt will invoke
> irq_setup_affinity() anyway which will chose a valid target CPU.
>
> So the right thing to do is to just return -EINVAL in case user space
> wrote an affinity mask which does not contain any online CPUs, except for
> ALPHA which has it's own magic sauce for this.
>
> Can you please test the patch below?

Yes, it works fine.

>
> Thanks,
>
> tglx
>
> 8<---------------
> diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
> index 3924fbe829d4..c9d8eb7f5c02 100644
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -128,8 +128,6 @@ static inline void unregister_handler_proc(unsigned int irq,
>
> extern bool irq_can_set_affinity_usr(unsigned int irq);
>
> -extern int irq_select_affinity_usr(unsigned int irq);
> -
> extern void irq_set_thread_affinity(struct irq_desc *desc);
>
> extern int irq_do_set_affinity(struct irq_data *data,
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 3089a60ea8f9..7eee98c38f25 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -481,23 +481,9 @@ int irq_setup_affinity(struct irq_desc *desc)
> {
> return irq_select_affinity(irq_desc_get_irq(desc));
> }
> -#endif
> +#endif /* CONFIG_AUTO_IRQ_AFFINITY */
> +#endif /* CONFIG_SMP */
>
> -/*
> - * Called when a bogus affinity is set via /proc/irq
> - */
> -int irq_select_affinity_usr(unsigned int irq)
> -{
> - struct irq_desc *desc = irq_to_desc(irq);
> - unsigned long flags;
> - int ret;
> -
> - raw_spin_lock_irqsave(&desc->lock, flags);
> - ret = irq_setup_affinity(desc);
> - raw_spin_unlock_irqrestore(&desc->lock, flags);
> - return ret;
> -}
> -#endif
>
> /**
> * irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
> diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
> index 9e5783d98033..af488b037808 100644
> --- a/kernel/irq/proc.c
> +++ b/kernel/irq/proc.c
> @@ -111,6 +111,28 @@ static int irq_affinity_list_proc_show(struct seq_file *m, void *v)
> return show_irq_affinity(AFFINITY_LIST, m);
> }
>
> +#ifndef CONFIG_AUTO_IRQ_AFFINITY
> +static inline int irq_select_affinity_usr(unsigned int irq)
> +{
> + /*
> + * If the interrupt is started up already then this fails. The
> + * interrupt is assigned to an online CPU already. There is no
> + * point to move it around randomly. Tell user space that the
> + * selected mask is bogus.
> + *
> + * If not then any change to the affinity is pointless because the
> + * startup code invokes irq_setup_affinity() which will select
> + * a online CPU anyway.
> + */
> + return -EINVAL;
> +}
> +#else
> +/* ALPHA magic affinity auto selector. Keep it for historical reasons. */
> +static inline int irq_select_affinity_usr(unsigned int irq)
> +{
> + return irq_select_affinity(irq);
> +}
> +#endif
>
> static ssize_t write_irq_affinity(int type, struct file *file,
> const char __user *buffer, size_t count, loff_t *pos)