Re: [PATCH] x86: PCI config space accessor functions should notignore the segment argument

From: Ingo Molnar
Date: Thu Jul 21 2011 - 05:06:31 EST



* Jan Beulich <JBeulich@xxxxxxxxxx> wrote:

> >>> On 21.07.11 at 09:10, Ingo Molnar <mingo@xxxxxxx> wrote:
> >> The access method 1 accessor, as it can be used for extended accesses
> >> (on AMD systems) instead gets added checks for the passed in segment to
> >> be zero (returning an error just like out of range values of the other
> >> arguments would cause).
> >
> > * Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
> >
> >> Without this change, the majority of the raw PCI config space access
> >> functions silently ignore a non-zero segment argument, which is
> >> certainly wrong.
> >>
> >> Apart from pci_direct_conf1, all other non-MMCFG access methods get
> >> used only for non-extended accesses (i.e. assigned to raw_pci_ops
> >> only). Consequently, with the way raw_pci_{read,write}() work, it would
> >> be a coding error to call these functions with a non-zero segment (with
> >> the current call flow this cannot happen afaict).
> >>
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>
> >>
> >> ---
> >> arch/x86/pci/ce4100.c | 4 ++++
> >> arch/x86/pci/direct.c | 6 ++++--
> >> arch/x86/pci/numaq_32.c | 2 ++
> >> arch/x86/pci/olpc.c | 4 ++++
> >> arch/x86/pci/pcbios.c | 2 ++
> >> 5 files changed, 16 insertions(+), 2 deletions(-)
> >>
> >> --- 3.0-rc7/arch/x86/pci/ce4100.c
> >> +++ 3.0-rc7-x86-pci-access-seg/arch/x86/pci/ce4100.c
> >> @@ -257,6 +257,8 @@ static int ce4100_conf_read(unsigned int
> >> {
> >> int i;
> >>
> >> + BUG_ON(seg);
> >> +
> >> if (bus == 1) {
> >> for (i = 0; i < ARRAY_SIZE(bus1_fixups); i++) {
> >> if (bus1_fixups[i].dev_func == devfn &&
> >> @@ -282,6 +284,8 @@ static int ce4100_conf_write(unsigned in
> >> {
> >> int i;
> >>
> >> + BUG_ON(seg);
> >> +
> >> if (bus == 1) {
> >> for (i = 0; i < ARRAY_SIZE(bus1_fixups); i++) {
> >> if (bus1_fixups[i].dev_func == devfn &&
> >> --- 3.0-rc7/arch/x86/pci/direct.c
> >> +++ 3.0-rc7-x86-pci-access-seg/arch/x86/pci/direct.c
> >> @@ -22,7 +22,7 @@ static int pci_conf1_read(unsigned int s
> >> {
> >> unsigned long flags;
> >>
> >> - if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
> >> + if (seg || (bus > 255) || (devfn > 255) || (reg > 4095)) {
> >> *value = -1;
> >> return -EINVAL;
> >> }
> >> @@ -53,7 +53,7 @@ static int pci_conf1_write(unsigned int
> >> {
> >> unsigned long flags;
> >>
> >> - if ((bus > 255) || (devfn > 255) || (reg > 4095))
> >> + if (seg || (bus > 255) || (devfn > 255) || (reg > 4095))
> >> return -EINVAL;
> >>
> >> raw_spin_lock_irqsave(&pci_config_lock, flags);
> >> @@ -97,6 +97,7 @@ static int pci_conf2_read(unsigned int s
> >> unsigned long flags;
> >> int dev, fn;
> >>
> >> + BUG_ON(seg);
> >> if ((bus > 255) || (devfn > 255) || (reg > 255)) {
> >> *value = -1;
> >> return -EINVAL;
> >> @@ -138,6 +139,7 @@ static int pci_conf2_write(unsigned int
> >> unsigned long flags;
> >> int dev, fn;
> >>
> >> + BUG_ON(seg);
> >> if ((bus > 255) || (devfn > 255) || (reg > 255))
> >> return -EINVAL;
> >>
> >> --- 3.0-rc7/arch/x86/pci/numaq_32.c
> >> +++ 3.0-rc7-x86-pci-access-seg/arch/x86/pci/numaq_32.c
> >> @@ -34,6 +34,7 @@ static int pci_conf1_mq_read(unsigned in
> >> unsigned long flags;
> >> void *adr __iomem = XQUAD_PORT_ADDR(0xcfc, BUS2QUAD(bus));
> >>
> >> + BUG_ON(seg);
> >> if (!value || (bus >= MAX_MP_BUSSES) || (devfn > 255) || (reg > 255))
> >> return -EINVAL;
> >>
> >> @@ -73,6 +74,7 @@ static int pci_conf1_mq_write(unsigned i
> >> unsigned long flags;
> >> void *adr __iomem = XQUAD_PORT_ADDR(0xcfc, BUS2QUAD(bus));
> >>
> >> + BUG_ON(seg);
> >> if ((bus >= MAX_MP_BUSSES) || (devfn > 255) || (reg > 255))
> >> return -EINVAL;
> >>
> >> --- 3.0-rc7/arch/x86/pci/olpc.c
> >> +++ 3.0-rc7-x86-pci-access-seg/arch/x86/pci/olpc.c
> >> @@ -206,6 +206,8 @@ static int pci_olpc_read(unsigned int se
> >> {
> >> uint32_t *addr;
> >>
> >> + BUG_ON(seg);
> >> +
> >> /* Use the hardware mechanism for non-simulated devices */
> >> if (!is_simulated(bus, devfn))
> >> return pci_direct_conf1.read(seg, bus, devfn, reg, len, value);
> >> @@ -264,6 +266,8 @@ static int pci_olpc_read(unsigned int se
> >> static int pci_olpc_write(unsigned int seg, unsigned int bus,
> >> unsigned int devfn, int reg, int len, uint32_t value)
> >> {
> >> + BUG_ON(seg);
> >> +
> >> /* Use the hardware mechanism for non-simulated devices */
> >> if (!is_simulated(bus, devfn))
> >> return pci_direct_conf1.write(seg, bus, devfn, reg, len, value);
> >> --- 3.0-rc7/arch/x86/pci/pcbios.c
> >> +++ 3.0-rc7-x86-pci-access-seg/arch/x86/pci/pcbios.c
> >> @@ -181,6 +181,7 @@ static int pci_bios_read(unsigned int se
> >> unsigned long flags;
> >> unsigned long bx = (bus << 8) | devfn;
> >>
> >> + BUG_ON(seg);
> >> if (!value || (bus > 255) || (devfn > 255) || (reg > 255))
> >> return -EINVAL;
> >>
> >> @@ -247,6 +248,7 @@ static int pci_bios_write(unsigned int s
> >> unsigned long flags;
> >> unsigned long bx = (bus << 8) | devfn;
> >>
> >> + BUG_ON(seg);
> >> if ((bus > 255) || (devfn > 255) || (reg > 255))
> >> return -EINVAL;
> >
> > Not sure we want a BUG_ON() which crashes the box - wouldn't a
> > WARN_ON() suffice?
>
> I can certainly convert it to a WARN_ON(), but it in fact *is* a bug if
> these get called with a non-zero segment (as code path analysis says
> that this is currently impossible unless the functions got called directly).
> Please let me know what you prefer.

WARN_ON()s are bugs too.

The only difference between a WARN_ON() and BUG_ON() is whether we
consider it impossible for the kernel to continue. That's clearly not
the case here so a WARN_ON() looks more justified to me.

So we reserve BUG_ON() to the rarest of occasions: when for example
it would be insecure to continue, or if the error is so severe that
there's absolutely no way for the kernel to continue.

> > also, the analysis/explanation is a bit incomplete:
> >
> >> The access method 1 accessor, as it can be used for extended
> >> accesses (on AMD systems) instead gets added checks for the
> >> passed in segment to be zero (returning an error just like out
> >> of range values of the other arguments would cause).
> >
> > Under what circumstances can this trigger in practice, with the
> > current code?
>
> I really don't know whether multi-segment PCI systems with AMD CPUs
> are existing in practice. If they do, and if MMCFG cannot be used
> there for whatever reason, accesses to segments other than zero
> would get issued to the wrong device(s). I would have thought that
> this is what the paragraph above says, but I certainly can add this
> more explicit description...

Do we have to re-discuss the upstream changelog policy every single
time again?

Yes, overly verbose, reader-friendly changelogs are preferred over
'anyone who is an expert in this field should know all the
expectations, status quo and consequences' kind of minimalistic
changelogs.

The former is consciously information-rich and robust, the latter is
information-poor and prone to be fragile, prone to be ambiguous,
making both the merge flow, any bug and commit triage and eventual
later fixes more difficult.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/