Re: [linux-pm] Bug in PCI core

From: Adam Belay
Date: Fri Oct 13 2006 - 15:20:54 EST


On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote:
Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> > No it didn't. It's undefined behaviour to perform *any* PCI config
> > access to the device while it's doing a D-state transition. It may have
>
> I think you missed the earlier parts of the story - the kernel caches
> the base config register state.
>
> > happened to work with the chips you tried it with, but more likely you
> > never hit that window because X simply didn't try to do that.
>
> Which is why the kernel caches the register state. This all came up long
> ago and the solution we currently have was the one chosen after
> considerable debate and analysis about things like locking. We preserved
> the historical reliable interface going back to the early Linux PCI
> support and used by all the apps.
>
>
> There are several problems with making it return an error
>
> - What does user space do ?
>
> while(pci_...() == -EAGAIN) yield();
>
> which is useful how - there is no select operation for waiting here, and
> while it could be added it just gets uglier
>

If the sysfs file blocked, this could be handled quite cleanly, and
would reflect accurate PCI config state.

> - Who actually wants to get an error in that specific case ?
>

Let's say the device is in D3cold (i.e. the parent bridge has been
powered down). In that case, you might want to get an error (probably
-EIO, but maybe FF...). A buffered copy would be incorrect if used by a
userspace driver, as this would be hiding a legitimate failure
condition.

> If you can find someone who desperately wants an error code then code in
> O_DIRECT support to do it and preserve the existing sane API.
>
> The job of the kernel is not to expose hardware directly, it is to
> provide sane interfaces to it. We don't have separate interfaces to
> conf1, conf2, pcibios etc for good reason. Exposing everyone to ugly
> minor details of the PCI transition handling isn't progress.
>

I suppose we have very different ideas about the actual role and purpose
of this sysfs interface. As I see it, it provides direct access to
hardware for userspace device drivers (software that actually cares
about the ugly PCI details). It's much lower-level than the highly
abstracted "vendor", "device", "resourceX", etc. interfaces. As such,
it's very important that it accurately reflects what's actually going on
in hardware, even if this is of potentially greater hassle to userspace.
Now that's not to suggest that we shouldn't block this interface when
making a power state transition. But I think it's best to expose the
hardware failure and powered off cases as errors.

On the other hand you seem to suggest that it is a potentially
approximate cache of the pci config space that primarily serves to
provide pci configuration data to userspace hardware detection
mechanisms. However, in this case, I think it may as well be marked as
deprecated, as it's clearly inferior to the higher order sysfs
attributes ("vendor", "device", "irq", "class", etc.) with regard to
accuracy, code complexity (both for the kernel and userspace), and
ease-of-use. In other words, I don't see a reason any userspace app
should ever use it other than for debugging (i.e. lspci).

Adam


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/