Re: [PATCH 1/1] virtio/s390: fix vritio-ccw device teardown

From: Vineeth Vijayan
Date: Tue Sep 21 2021 - 14:25:24 EST


On Tue, 2021-09-21 at 18:52 +0200, Halil Pasic wrote:
> > > > lock already,
> > >
> > > I believe we have a misunderstanding here. I believe that Vineeth
> > > is
> > > trying to tell us, that online_store_handle_offline() and
> > > online_store_handle_offline() are called under the a device lock
> > > of
> > > the ccw device. Right, Vineeth?
> > Yes. I wanted to bring-out both the scenario.The
> > set_offline/_online()
> > calls and the unconditional-remove call.
>
> I don't understand the paragraph above. I can't map the terms
> set_offline/_online() and unconditional-remove call to chunks of
> code.
> :(
online_store() function can be used to set_online/set_offline manually
from the sysfs entry.
And an unconditional-remove call, for CIO, starts with a CRW which
indicates there is a subchannel_event which needs to be taken care.
This sch_event() (in device.c) then try to find the reason for this CRW
and act accordingly. This would lead to device_del and end up calling
the remove function of the driver.

> > For the set_online The virtio_ccw_online() also invoked with
> > ccwlock
> > held. (ref: ccw_device_set_online)
>
> I don't think virtio_ccw_online() is invoked with the ccwlock held. I
> think we call virtio_ccw_online() in this line:
> https://elixir.bootlin.com/linux/v5.15-rc2/source/drivers/s390/cio/device.c#L394
> and we have released the cdev->ccwlock literally 2 lines above.
My bad. I overlooked it!
>
>
> > > Conny, I believe, by online/offline callbacks, you mean
> > > virtio_ccw_online() and virtio_ccw_offline(), right?
> > >
> > > But the thing is that virtio_ccw_online() may get called (and is
> > > typically called, AFAICT) with no locks held via:
> > > virtio_ccw_probe() --> async_schedule(virtio_ccw_auto_online,
> > > cdev)
> > > -*-> virtio_ccw_auto_online(cdev) --> ccw_device_set_online(cdev)
> > > -->
> > > virtio_ccw_online()
> > >
> > > Furthermore after a closer look, I believe because we don't take
> > > a reference to the cdev in probe, we may get
> > > virtio_ccw_auto_online()
> > > called with an invalid pointer (the pointer is guaranteed to be
> > > valid
> > > in probe, but because of async we have no guarantee that it will
> > > be
> > > called in the context of probe).
> > >
> > > Shouldn't we take a reference to the cdev in probe?
> > We just had a quick look at the virtio_ccw_probe() function.
> > Did you mean to have a get_device() during the probe() and
> > put_device()
> > just after the virtio_ccw_auto_online() ?
>
> Yes, that would ensure that cdev pointer is still valid when
> virtio_ccw_auto_online() is executed, and that things are cleaned up
> properly, I guess. But I'm not 100% sure about all the interactions.
> AFAIR ccw_device_set_online(cdev) would bail out if !drv. But then
> we have the case where we already assigned it to a new driver (e.g.
> vfio for dasd).
>
> BTW I believe if we have a problem here, the dasd driver has the same
> problem as well. The code looks very, very similar.
You are right about that. I am trying to recreate that issue with DASD
now. And working on the patch as well.
>
> And shouldn't this auto-online be common CIO functionality? What is
> the
> reason the char devices don't seem to have it?
I am not sure about that. I dont understand why it should be a CIO
functionality.
>
> Regards,
> Halil