Re: [PATCH v2 2/4] pciehp: Use link change notifications for hot-plugand removal

From: Bjorn Helgaas
Date: Fri Dec 13 2013 - 16:14:34 EST


On Fri, Dec 13, 2013 at 12:04 PM, Rajat Jain <rajatjain@xxxxxxxxxxx> wrote:
>> On Thu, Dec 12, 2013 at 11:26 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> > On Thu, Dec 12, 2013 at 2:44 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx>

> Also, I think the device removal should _always_ be initiated (if not done already) whenever the Link goes down for any reason (irrespective of whether the attention button is implemented or not, or whether the surprise events are supported or not). I think this is logical as it makes no sense for the software to think the device is accessible when in reality it is not. In fact I think we should also remove the checks in pciehp_disable_slot() that ensure that the adapter should be populated and latch should be closed.

I agree.

>> What does the "other OS" do about this repeater? Did you verify that it
>> disables the link when powering off the slot? If we were smarter about
>> presence detect, I wonder if that would be enough.
>>
>> > Also we can get rid of annoying aer during power off slot.
>>
>> I don't know the details of this issue. It may be possible to avoid the
>> AER in some way other than turning off the link.
>
> Sorry, I could not understand what we are talking about here. I'd appreciate if you could elaborate if you see this as a problem related to the patch?

I assume Yinghai means that when we power off the slot, if AER is
still enabled, it may report errors caused by the link going down.
PCIe r3.0 sec. 3.2.1 says some of these cases must not be considered
errors if the link is disabled, the port is associated with a
hot-pluggable slot, etc. Maybe his platform doesn't follow those
rules, or maybe there's some other AER error that's not covered by
them.

It's related to your patch because you are removing the link disable,
and Yinghai says that if we leave the link enabled, he gets unwanted
AER errors when powering off the slot. Sorry if this was all obvious
to you. I don't know any more details. It'd be nice to know the
exact AER errors and the PCIe capability info. Then we might be able
to figure out a way to leave the link enabled while still suppressing
the AER errors.

> Once again: the way I interpret this is:
> * Always enable Link events.
> * Disable presence events if attention button is present.

That sounds like a good plan to me.

I'm also dubious about this use of presence detect:

pciehp_power_thread
pciehp_enable_slot
pciehp_get_adapter_status
pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status)
*status = !!(slot_status & PCI_EXP_SLTSTA_PDS)
board_added
pciehp_power_on_slot

because we apparently look at PCI_EXP_SLTSTA_PDS when the slot is
powered off. Only in-band presence detection is required by spec, and
in-band detection only works when power is applied. So I think this
pciehp_get_adapter_status() call should probably just be removed.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/