Re: [PATCH] virtio: Remove virtio devices on device_shutdown()
From: Kirill A. Shutemov
Date: Thu Aug 08 2024 - 09:15:34 EST
On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wrote:
> > Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory
> > accesses during the hang.
> >
> > Invalid read at addr 0x102877002, size 2, region '(null)', reason: rejected
> > Invalid write at addr 0x102877A44, size 2, region '(null)', reason: rejected
> > ...
> >
> > It was traced down to virtio-console. Kexec works fine if virtio-console
> > is not in use.
>
> virtio is not doing a lot of 16 bit reads.
> Are these the reads:
>
> virtio_cread(vdev, struct virtio_console_config, cols, &cols);
> virtio_cread(vdev, struct virtio_console_config, rows, &rows);
>
> ?
>
> write is a bit puzzling too. This one?
>
> bool vp_notify(struct virtqueue *vq)
> {
> /* we write the queue's selector into the notification register to
> * signal the other end */
> iowrite16(vq->index, (void __iomem *)vq->priv);
> return true;
> }
Given that we are talking about console issue, any suggestion on how to
check?
> >
> > Looks like virtio-console continues to write to the MMIO even after
> > underlying virtio-pci device is removed.
>
> You mention both MMIO and pci, I am confused.
By MMIO, I mean accesses to PCI BARs. But it is only my *guess* on the
situation, I have limited knowledge of the area. I am not drivers guy.
> Removed by what? In what sense?
So device_shutdown() iterates over all device and we hit the problem when
we get to virtio-pci devices and call pci_device_shutdown on them.
I *think* PCI BAR (or something else?) becomes unavailable after that but
it is still accessed.
> >
> > The problem can be mitigated by removing all virtio devices on virtio
> > bus shutdown.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Reported-by: Hongyu Ning <hongyu.ning@xxxxxxxxxxxxxxx>
>
> A bit worried about doing so much activity on shutdown,
> and for all devices, too. I'd like to understand what
> is going on a bit better - could be a symptom of
> a bigger problem (e.g. missing handling for suprise
> removal?).
I probably should have marked the patch as RFC. The patch was intended to
start conversation. I am not sure it is correct. This patch just happened
to work in our setup.
--
Kiryl Shutsemau / Kirill A. Shutemov