On Thu, 25 Aug 2022 16:23:48 +0100,
Pierre Gondois <pierre.gondois@xxxxxxx> wrote:
In KvmTool, on a ThunderX2, when running a preemp_rt kernel based on
How is this specific to kvmtool? Or to running as a guest?
v5.19-rc3-rt4, the following happens:
[ 4.070739] [ BUG: Invalid wait context ]
[ 4.070740] 5.19.0-rc3-rt4-00001-g1a6597c0bdcf #153 Not tainted
[ 4.070742] -----------------------------
[ 4.070743] swapper/0/1 is trying to lock:
[ 4.070744] ffff0000ab7405d8 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc (mm/slub.c:2954)
[ 4.070757] other info that might help us debug this:
[ 4.070758] context-{5:5}
[ 4.070759] 5 locks held by swapper/0/1:
[ 4.070760] #0: ffff0000811491e0 (&dev->mutex){....}-{4:4}, at: __device_driver_lock (drivers/base/dd.c:1055)
[ 4.070769] #1: ffff0000846c5670 (&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq (kernel/irq/internals.h:147)
[ 4.070778] #2: ffff0000846c54c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq (kernel/irq/manage.c:1612)
[ 4.070784] #3: ffff80000b23ea78 (mask_lock){....}-{2:2}, at: irq_setup_affinity (./include/linux/irq.h:381)
[ 4.070791] #4: ffff80000b23ea38 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity (./include/linux/irq.h:381)
[ 4.070797] stack backtrace:
[ 4.070801] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-rt4-00001-g1a6597c0bdcf #153
[ 4.070805] Call trace:
[ 4.070806] dump_backtrace (arch/arm64/kernel/stacktrace.c:200)
[ 4.070811] show_stack (arch/arm64/kernel/stacktrace.c:207)
[ 4.070813] dump_stack_lvl (lib/dump_stack.c:107)
[ 4.070818] dump_stack (lib/dump_stack.c:114)
[ 4.070820] __lock_acquire (kernel/locking/lockdep.c:4707)
[ 4.070823] lock_acquire (kernel/locking/lockdep.c:466)
[ 4.070825] rt_spin_lock (./arch/arm64/include/asm/current.h:19 (discriminator 4))
[ 4.070830] ___slab_alloc (mm/slub.c:2954)
[ 4.070832] __slab_alloc.isra.0 (mm/slub.c:3116)
[ 4.070835] __kmalloc_node (mm/slub.c:3207)
[ 4.070837] alloc_cpumask_var_node (lib/cpumask.c:115)
[ 4.070843] alloc_cpumask_var (lib/cpumask.c:147)
[ 4.070846] its_select_cpu (drivers/irqchip/irq-gic-v3-its.c:1580)
[ 4.070850] its_set_affinity (drivers/irqchip/irq-gic-v3-its.c:1659)
[ 4.070853] msi_domain_set_affinity (kernel/irq/msi.c:501)
[ 4.070857] irq_do_set_affinity (kernel/irq/manage.c:276)
[ 4.070860] irq_setup_affinity (kernel/irq/manage.c:633)
[ 4.070863] irq_startup (kernel/irq/chip.c:280)
[ 4.070865] __setup_irq (kernel/irq/manage.c:1777)
[ 4.070869] request_threaded_irq (kernel/irq/manage.c:2206)
[ 4.070872] vp_find_vqs_msix (./include/linux/interrupt.h:168)
[ 4.070876] vp_find_vqs (drivers/virtio/virtio_pci_common.c:400)
[ 4.070878] vp_modern_find_vqs (drivers/virtio/virtio_pci_modern.c:259)
[ 4.070880] init_vq (./include/linux/virtio_config.h:213)
[ 4.070885] virtblk_probe (drivers/block/virtio_blk.c:936)
[ 4.070887] virtio_dev_probe (drivers/virtio/virtio.c:303)
[ 4.070892] really_probe (drivers/base/dd.c:555)
[ 4.070895] __driver_probe_device (drivers/base/dd.c:764)
[ 4.070897] driver_probe_device (drivers/base/dd.c:794)
[ 4.070899] __driver_attach (drivers/base/dd.c:1164)
[ 4.070901] bus_for_each_dev (drivers/base/bus.c:301)
[ 4.070904] driver_attach (drivers/base/dd.c:1181)
[ 4.070906] bus_add_driver (drivers/base/bus.c:618)
[ 4.070908] driver_register (drivers/base/driver.c:240)
[ 4.070910] register_virtio_driver (drivers/virtio/virtio.c:356 (discriminator 4))
[ 4.070913] virtio_blk_init (drivers/block/virtio_blk.c:1218)
[ 4.070918] do_one_initcall (init/main.c:1295)
[ 4.070921] kernel_init_freeable (init/main.c:1367)
[ 4.070924] kernel_init (init/main.c:1503)
[ 4.070927] ret_from_fork (arch/arm64/kernel/entry.S:868)
Consider trimming the trace to the useful part.
commit cba4235e6031e ("genirq: Remove mask argument from
setup_affinity()")
and
commit 11ea68f553e24 ("genirq, sched/isolation: Isolate from handling
managed interrupts")
overcome this issue by defining a static struct cpumask and protecting
it by a raw spinlock. The code in these commits is executed with IRQs
disabled.
its_select_cpu() can be executed with IRQs enabled or disabled. Thus
disabling IRQs is necesserary to avoid deadlocking.
Only if this code can be preempted by an interrupt and subsequently
called from interrupt context. Can you describe this code path?
This patch:
Please refer to the documentation and the section about avoiding
things like "this patch" in a commit message.
- makes tmpmask a 'static struct cpumask'. This prevents storing it on
the stack and having to dynamically allocate it
- protects tmpmask with a raw spinlock
- disables IRQs around the spinlock for the case its_select_cpu() is
called with IRQs enabled
Signed-off-by: Pierre Gondois <pierre.gondois@xxxxxxx>
---
drivers/irqchip/irq-gic-v3-its.c | 34 +++++++++++++++++---------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..3cf89e59b036 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1574,14 +1574,15 @@ static int its_select_cpu(struct irq_data *d,
const struct cpumask *aff_mask)
{
struct its_device *its_dev = irq_data_get_irq_chip_data(d);
- cpumask_var_t tmpmask;
+ static DEFINE_RAW_SPINLOCK(tmpmask_lock);
+ static struct cpumask tmpmask;
+ unsigned long flags;
int cpu, node;
-
- if (!alloc_cpumask_var(&tmpmask, GFP_ATOMIC))
- return -ENOMEM;
-
node = its_dev->its->numa_node;
+ local_irq_save(flags);
+ raw_spin_lock(&tmpmask_lock);
Why is this done in two steps? What is wrong with raw_spin_lock_irqsave()?
More importantly, a number of call paths reaching its_set_affinity()
already execute under a raw spinlock, with interrupts disabled. Is it
worth not hitting this lock at all times?
+
if (!irqd_affinity_is_managed(d)) {
/* First try the NUMA node */
if (node != NUMA_NO_NODE) {
@@ -1589,8 +1590,8 @@ static int its_select_cpu(struct irq_data *d,
* Try the intersection of the affinity mask and the
* node mask (and the online mask, just to be safe).
*/
- cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
- cpumask_and(tmpmask, tmpmask, cpu_online_mask);
+ cpumask_and(&tmpmask, cpumask_of_node(node), aff_mask);
+ cpumask_and(&tmpmask, &tmpmask, cpu_online_mask);
Consider reducing the churn by writing something like:
static struct cpumask __tmpmask;
struct cpumask *tmpmask = &__tmpmask;
which is strictly equivalent, and makes the patch much smaller.
Thanks,
M.