Re: [PATCH v2 2/4] pciehp: Use link change notifications for hot-plugand removal

From: Bjorn Helgaas
Date: Sun Dec 15 2013 - 19:18:53 EST


On Sun, Dec 15, 2013 at 4:24 PM, Rajat Jain <rajatjain@xxxxxxxxxxx> wrote:
>> > >
>> > >> Once again: the way I interpret this is:
>> > >> * Always enable Link events.
>> > >> * Disable presence events if attention button is present.
>> > >
>> > > That sounds like a good plan to me.
>> >
>> > How about Diag_Reset from MPT2SAS and others?
>> > link could up and down
>> >
>
> I am assuming you are referring to
>
> static int
> _base_diag_reset(struct MPT2SAS_ADAPTER *ioc, int sleep_flag)
>
> Which as far as I could understand would cause link to go down and come up
> again without the kernel knowing anything about it?
> ...

> In general, I guess the question is when a link goes down and back up, whether
> or not we want to treat it as a hot unplug followed by a hotplug. I think there
> may be cases such as AER (or the one Yinghai mentions) where we don't want to
> treat it as a hotplug (see note below). And there may be cases that we
> definitely want to treat it as hotplug (need link events!). Situation gets more
> complex since there may be pciehp slots downstream of a link getting reset.
>
> It seems to me that we are saying that a mechanism is needed so that a voluntary
> Link flap is NOT treated like a hotplug temporarily?
> ...

> Notes:
> * it may not OK, if the kernel thinks the device is accessible when it is really not.
> What if during this downtime, someone tries to access the device (say userspace)?
> * How do we know after the link up, that the device is really the same.
> If during this reset, the device changed its "character", say a different class?
> I think a rescan should be mandated after every such event.
> * Do we need to unload and reload the driver after the link reset, since it can be a different device?

I am quite dubious about the idea of a voluntary link flap. If the
link goes down and comes back up, I don't see how we can make any
assumptions about what device is there.

I let Alex talk me into pciehp_reset_slot(), which disables presence
detect interrupts while resetting a device, so we already have a bit
of precedent for the idea. But even in that case, the device could
easily come out of reset as a different device, e.g., if the reset
caused it to load updated firmware.

I would feel much better if we treated link down as a remove and did a
rescan on the link up.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/