Re: [PATCH] vfio/pci: when irq_bypass_register_producer() return fails, we need to clean it up and return -EINVAL. instead of return true.
From: Alex Williamson
Date: Tue Sep 29 2020 - 16:54:50 EST
On Sun, 27 Sep 2020 20:11:08 +0800
guomin_chen@xxxxxxxx wrote:
> From: guomin chen <guomin_chen@xxxxxxxx>
>
> Since eventfd "fds" is passed as a parameter by the upper-level
> application,when "fds" has multiple identical 'fd', it causes
> multiple different vfio_pci_irq_ctx->trigger and producer->token
> pointing to the same eventfd file. Although all but the first one
> can register successfully,all others fail to register.
>
> So when others producer released later, the list_del(&producer->node)
> will be called due to the different producer->token pointing to the
> same eventfd file, then triggering the BUG():
>
> vfio-pci 0000:db:00.0: irq bypass producer (token 0000000060c8cda5) registration fails: -16
> vfio-pci 0000:db:00.0: irq bypass producer (token 0000000060c8cda5) registration fails: -16
> vfio-pci 0000:db:00.0: irq bypass producer (token 0000000060c8cda5) registration fails: -16
> vfio-pci 0000:db:00.0: irq bypass producer (token 0000000060c8cda5) registration fails: -16
> vfio-pci 0000:db:00.0: irq bypass producer (token 0000000060c8cda5) registration fails: -16
> list_del corruption, ffff8f7fb8ba0828->next is LIST_POISON1 (dead000000000100)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:47!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 29 PID: 3914 Comm: qemu-kvm Kdump: loaded Tainted: G E
> -------- - -4.18.0-193.6.3.el8.x86_64 #1
> Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-,
> BIOS -[IVE636Z-2.13]- 07/18/2019
> RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
> Code: ce ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 40 85 4d 88 e8 8c bc
> ce ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 d0 85 4d 88 e8 78 bc
> ce ff <0f> 0b 48 c7 c7 80 86 4d 88 e8 6a bc ce ff 0f 0b 48
> 89 f2 48 89 fe
> RSP: 0018:ffffaa9d60197d20 EFLAGS: 00010246
> RAX: 000000000000004e RBX: ffff8f7fb8ba0828 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff8f7fbf4d6a08 RDI: ffff8f7fbf4d6a08
> RBP: 0000000000000000 R08: 000000000000084b R09: 000000000000005d
> R10: 0000000000000000 R11: ffffaa9d60197bd0 R12: ffff8f4fbe863000
> R13: 00000000000000c2 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f7cb97fa700(0000) GS:ffff8f7fbf4c0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fcf31da4000 CR3: 0000005f6d404001 CR4: 00000000007626e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> irq_bypass_unregister_producer+0x9b/0xf0 [irqbypass]
> vfio_msi_set_vector_signal+0x8c/0x290 [vfio_pci]
> ? load_fixmap_gdt+0x22/0x30
> vfio_msi_set_block+0x6e/0xd0 [vfio_pci]
> vfio_pci_ioctl+0x218/0xbe0 [vfio_pci]
> ? kvm_vcpu_ioctl+0xf2/0x5f0 [kvm]
> do_vfs_ioctl+0xa4/0x630
> ? syscall_trace_enter+0x1d3/0x2c0
> ksys_ioctl+0x60/0x90
> __x64_sys_ioctl+0x16/0x20
> do_syscall_64+0x5b/0x1a0
> entry_SYSCALL_64_after_hwframe+0x65/0xca
>
> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: Cornelia Huck <cohuck@xxxxxxxxxx>
> Cc: Jiang Yi <giangyi@xxxxxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Peter Xu <peterx@xxxxxxxxxx>
> Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> Cc: kvm@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: guomin chen <guomin_chen@xxxxxxxx>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1d9fb25..dd3a495 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -352,10 +352,21 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> vdev->ctx[vector].producer.token = trigger;
> vdev->ctx[vector].producer.irq = irq;
> ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> - if (unlikely(ret))
> - dev_info(&pdev->dev,
> + if (unlikely(ret)) {
> + dev_err(&pdev->dev,
> "irq bypass producer (token %p) registration fails: %d\n",
> vdev->ctx[vector].producer.token, ret);
> +
> + kfree(vdev->ctx[vector].name);
> + eventfd_ctx_put(trigger);
> +
> + cmd = vfio_pci_memory_lock_and_enable(vdev);
> + free_irq(irq, trigger);
> + vfio_pci_memory_unlock_and_restore(vdev, cmd);
> +
> + vdev->ctx[vector].trigger = NULL;
> + return -EINVAL;
> + }
>
> vdev->ctx[vector].trigger = trigger;
>
This is not the correct solution. Registering an IRQ bypass is an
accelerator, not a requirement. Failure should never cause the ioctl
to fail. The scenario you describe is a valid user configuration, the
issue is that the de-registration passes a bogus producer object that
was never successfully registered, causing a false match. Therefore I
believe the solution is to simply clear the token on registration
failure to prevent that bogus match. That should result in all the
additional producer objects with the same trigger harmlessly falling
out of the unregister function. Can you validate and post such a
patch? Thanks,
Alex