Re: [PATCH] PCI: Add sysfs attribute for PCI device power state

From: Maximilian Luz
Date: Wed Nov 18 2020 - 18:29:01 EST


On 11/18/20 6:49 PM, Bjorn Helgaas wrote:
On Sun, Nov 15, 2020 at 10:19:22PM +0100, Maximilian Luz wrote:
On 11/15/20 9:27 PM, Bjorn Helgaas wrote:

[...]

I think something read from sysfs is a snapshot with no guarantee
about how long it will remain valid, so I don't see a problem with the
value being stale by the time userspace consumes it.

I agree on this, and the READ_ONCE won't protect against it. The
READ_ONCE would only protect against future changes, e.g. something like

const char *state_names[] = { ... };

// check if state is invalid
if (READ(pci_dev->current_state) >= ARRAY_SIZE(state_names))
return sprintf(..., "invalid");
else // look state up in table
return sprintf(..., state_names[READ(pci_dev->current_state)])

Note that I've explicitly marked the problematic reads here: If those
are done separately, the invalidity check may pass, but by the time the
state name is looked up, the value may have changed and may be invalid.

Note further that if we have something like

pci_power_t state = pci_dev->current_state;

the compiler is, in theory, free to replace each access to "state" with
a read to pci_dev->current_state. As far as I can tell, the whole point
of READ_ONCE is to prevent that and ensure that there is only one read.

Note also that something like this could be easily introduced by
changing the code in pci_power_name(), as that is likely inlined by the
compiler. I'm not entirely sure, but I think that the compiler is allowed
to, at least theoretically, split that into two reads here and inlining
might be done before further optimization.

On the other hand, the changes that could lead to issues above are
fairly unlikely to cause them as the compiler will _probably_ read the
value only once anyways.

Well, OK, I see your point. But I'm not convinced it's worth
cluttering the code for this. There must be dozens of similar cases,
and if we do need to worry about this, I'd like to do it
systematically for all of drivers/pci/ instead of doing it piecemeal.

Fair enough, that's a valid point.

For full formal correctness, writes to current_state should probably
have also been guarded with WRITE_ONCE to prevent the compiler from
splitting the write instruction in addition to the read with READ_ONCE
in this patch.

Again, that's mostly a point about formal correctness and issues like
that shouldn't happen in practice (or if they would, would probably have
broken other parts of the kernel already).

Looking at other sysfs functions, it seems like most of them would have
the same issues, so it makes sense to drop the READ_ONCE.

I do think it's probably worth making sure we can't set
dev->current_state to something that's invalid, and also taking a look
at the PCI core interfaces that take a pci_power_t, i.e., those in
include/linux/pci.h, to make sure they do the right thing if a driver
supplies garbage.

As far as I can tell, those checks should already be there.

Thanks,
Max