Re: [PATCH] pciehp: Add pciehp_surprise module option

From: Bjorn Helgaas
Date: Wed Apr 10 2013 - 13:20:16 EST


On Wed, Apr 10, 2013 at 10:34 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
> Hi Bjorn,
>
> sorry for the late follow up as I was on vacation and has been busy
> for other tasks. Since this topic went to nirvana, I try to whip
> again...
>
> At Mon, 25 Mar 2013 10:58:40 -0600,
> Bjorn Helgaas wrote:
>>
>> On Wed, Mar 20, 2013 at 8:02 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
>> > We encountered a problem that on some HP machines the Realtek PCI-e
>> > card reader device appears only when you inserted a card before the
>> > cold boot. While debugging, it turned out that the device is actually
>> > handled via PCI-e hotplug in some level. The device sends a presence
>> > change notification, and pciehp receives it, but it's ignored because
>> > of lack of the hotplug surprise (PCI_EXP_SLTCAP_HPS) capability bit.
>> > Once when this check passes, everything starts working -- the device
>> > appears upon plugging the card properly.
>> >
>> > There are a few other bug reports indicating the similar problems
>> > (e.g. on recent Dell laptops), and I guess the culprit is same.
>>
>> Can you point us at these bug reports, e.g., with URLs? Hopefully
>> they will contain complete dmesg logs and "lspci -vv" outputs so we
>> can debug this a bit more.
>
> The machine isn't in market yet, so we cannot expose all things, but I
> attach the lspci snippet of the relevant parts, pci-e ports and the
> card reader, at least. If you need anything else, let me know.
>
> As Oliver and Michal already replied, Windows (both 7/8) identifies
> the device without modification. This implies that Windows handles
> the hotplug no matter whether the surprise bit is set or not, either
> globally or device-specifically. But, since this is pretty new
> hardware, we highly doubt it's done in a white-list basis.
>
>> I'm strongly opposed to adding a module option to work around this
>> issue because the user experience is unacceptable. We can't expect
>> users to debug the problem and discover the option.
>>
>> I'm also opposed to a DMI quirk system because I think it's very
>> likely that this device works correctly under Windows, and I doubt
>> very much that Windows has to include the equivalent of DMI quirks.
>> So we should, at least in principle, be able to figure out how to make
>> it work, too.
>
> In order to get things a bit straight, let me list up the things we
> found again:
>
> - The Realtek card reader devices doesn't appear in lspci at the
> fresh boot in multiple kernel versions from 3.0 to 3.9.
>
> - Once when the card is inserted, it issues the hotplug IRQ event.
>
> - pciehp receives and handles the event but it doesn't add/remove the
> device actually because the corresponding controller has no surprise
> bit.
>
> When forcibly enabling the hotplug device addition by my patch, it
> starts working. The device is added at card insert. The removal
> doesn't trigger on our system, but the event itself seems
> generated.
>
> - The surprise bit can't be changed as it's supposed to be read-only
> register bits. Thus no PCI quirk seems possible, and it has to be
> fixed in pciehp.
>
> - Another way to detect the PCI card reader device is to perform
> echo 1 > /sys/bus/pci/rescan
> with a memory card inserted. It doesn't work without the card,
> and it is less sophisticated than pciehp, of course.
>
> Right now, we applied a patch for pciehp to ignore the surprise bit
> per basis of DMI string match. This works, but doesn't scale; if the
> same problem happens on a similar model, the driver must be compiled
> again. A module option would be really convenient for that, although
> I understand your concern, too.
>
> Of course, an alternative (and more radical) solution is to remove the
> surprise bit check completely from pciehp, as Matthew suggested in the
> thread. What risk would it bring?

I think we need to ignore the surprise bit as Matthew suggested.

Alex raised an issue with this (secondary bus reset causes a device
remove followed by device add), so we'd have to work through that
somehow. I think doing the remove/add would actually be more correct
because we would be doing the whole device initialization after the
reset. We currently save/restore some device state around the reset,
but that's a piecemeal approach that is certain to miss internal
hidden state. But I don't know how to deal with the KVM implications
of remove/add.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/