Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector_lock

From: Thomas Gleixner
Date: Mon Mar 29 2021 - 08:43:30 EST


Waiman,

On Sun, Mar 28 2021 at 20:52, Waiman Long wrote:
> It was found that the following circular locking dependency warning
> could happen in some systems:
>
> [ 218.097878] ======================================================
> [ 218.097879] WARNING: possible circular locking dependency detected
> [ 218.097880] 4.18.0-228.el8.x86_64+debug #1 Not tainted

Reports have to be against latest mainline and not against the random
distro frankenkernel of the day. That's nothing new.

Plus I was asking you to provide a full splat to look at so this can be
discussed _upfront_. Oh well...

> [ 218.097914] -> #2 (&irq_desc_lock_class){-.-.}:
> [ 218.097917] _raw_spin_lock_irqsave+0x48/0x81
> [ 218.097918] __irq_get_desc_lock+0xcf/0x140
> [ 218.097919] __dble_irq_nosync+0x6e/0x110

This function does not even exist in mainline and never existed...

> [ 218.097967]
> [ 218.097967] Chain exists of:
> [ 218.097968] console_oc_lock_class --> vector_lock
> [ 218.097972]
> [ 218.097973] Possible unsafe locking scenario:
> [ 218.097973]
> [ 218.097974] CPU0 CPU1
> [ 218.097975] ---- ----
> [ 218.097975] lock(vector_lock);
> [ 218.097977] lock(&irq_desc_lock_class);
> [ 218.097980] lock(vector_lock);
> [ 218.097981] lock(console_owner);
> [ 218.097983]
> [ 218.097984] *** DEADLOCK ***
> [ 218.097984]
> [ 218.097985] 6 locks held by systemd/1:
> [ 218.097986] #0: ffff88822b5cc1e8 (&tty->legacy_mutex){+.+.}, at: tty_init_dev+0x79/0x440
> [ 218.097989] #1: ffff88832ee00770 (&port->mutex){+.+.}, at: tty_port_open+0x85/0x190
> [ 218.097993] #2: ffff88813be85a88 (&desc->request_mutex){+.+.}, at: __setup_irq+0x249/0x1e60
> [ 218.097996] #3: ffff88813be858c0 (&irq_desc_lock_class){-.-.}, at: __setup_irq+0x2d9/0x1e60
> [ 218.098000] #4: ffffffff84afca78 (vector_lock){-.-.}, at: x86_vector_activate+0xca/0xab0
> [ 218.098003] #5: ffffffff84c27e20 (console_lock){+.+.}, at: vprintk_emit+0x13a/0x450

This is a more fundamental problem than just vector lock and the same
problem exists with any other printk over serial which is nested in the
interrupt activation chain not only on X86.

> -static int activate_reserved(struct irq_data *irqd)
> +static int activate_reserved(struct irq_data *irqd, char *wbuf, size_t wsize)
> {

...

> if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
> irq_data_get_affinity_mask(irqd))) {
> - pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
> - irqd->irq);
> + snprintf(wbuf, wsize, KERN_WARNING
> + "irq %u: Affinity broken due to vector space exhaustion.\n",
> + irqd->irq);

This is not really any more tasteful than the previous one and it does
not fix the fundamental underlying problem.

But, because I'm curious and printk is a constant source of trouble, I
just added unconditional pr_warns into those functions under vector_lock
on 5.12-rc5.

Still waiting for the lockdep splat to show up while enjoying the
trickle of printks over serial.

If you really think this is an upstream problem then please provide a
corresponding lockdep splat on plain 5.12-rc5 along with a .config and
the scenario which triggers this. Not less, not more.

Thanks,

tglx