Re: [PATCH] pciehp: Handle interrupts that happen during initialization.

From: Eric W. Biederman
Date: Fri Feb 13 2009 - 23:06:57 EST


Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> writes:

> Any update here, Eric? Sounds like you're using hotplug in real environments
> with complex topologies (based on your earlier messages), so we're interested
> in what you're seeing here...

Yes.

Currently I have a test system that is a subset of what I'm worried
about and will shortly have the real hardware, so my immediate goal is
to get things working well enough so my internal users won't get
blocked by bugs. Currently I only have the pcie hotplug and pcie
hotplug surprise case. My basic topology is 16 hotplug slots into
which I will be plugging in pci express switches with a couple of
additional hotplug slots. As for the firmware, I will have it reserving
bus numbers and mmio space on each of the first 16 slots and the rest
is going to be up to the linux kernel. This is an embedded design
so no ACPI is appears more pain than it is worth to implement.

I am also looking at the case of pcie switches with two upstream
ports, and switching which cpu they are connected to at runtime. So
in some cases I will have devices whose presence is detected but will
not get link for hours or days, as opposed to the 20ms time limit in
the pci express specification. Call it a necessary extension.

I need to revisit the pciehp driver but my first pass through it
looked like every corner case appeared to get something wrong. So I
have written myself a little 430 line replaces that handles the case
that I currently care about. Part of what I was seeing before is that
we don't clear pending events in the pciehp driver before we enable
interrupts. So if booting the system has left some pending and you
have CONFIG_DEBUG_SHIRQ enabled you get a nice oops because p_slot has
not been initialized and so the interrupts can't be handled.

My little driver is at least good enough to let me start looking at
other things. I have just found yesterday that if you mmap a resource
in sysfs hot-remove doesn't complete. Sysfs issues seem to be the
bane of my existence and I am currently working on a patch for that.

Currently the pcie port driver calls the remove methods of child
drivers twice if it removed (say because you have hot unplugged a
bridge). Which is one of the truly nasty bugs I saw when I was trying
to bring up my test system, as things start access freed memory and
all kinds of silly things happen.

After I get the worst of the problems handled I intend to do a
thorough review and fix everything that I can see.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/