Re: arm64 syzbot instances

From: Peter Maydell
Date: Mon Mar 22 2021 - 09:53:02 EST


On Sun, 21 Mar 2021 at 19:00, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Sat, Mar 20, 2021 at 9:43 PM Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
> >
> > On Fri, 12 Mar 2021 at 09:16, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > So it's probably qemu that triggers the 'synchronous external
> > > abort' when accessing the PCI I/O space, which in turn hints
> > > towards a bug in qemu. Presumably it only returns data from
> > > I/O ports that are actually mapped to a device when real hardware
> > > is supposed to return 0xffffffff when reading from unused I/O ports.
> >
> > Do you have a reference to the bit of the PCI spec that mandates
> > this -1/discard behaviour for attempted access to places where
> > there isn't actually a PCI device mapped ? The spec is pretty
> > long and hard to read...
> >
> > (Knowing to what extent this behaviour is mandatory for all
> > PCI systems/host controllers vs just "it would be nice if the
> > gpex host controller worked this way" would help in figuring
> > out where in QEMU to change.)
>
> I spent some more time looking at both really old PCI specifications,
> and new ones.
> The old PCI specs seem to just leave this bit as out of scope because
> it does not concern transactions on the bus. The PCI host controller
> can either report a 'master abort' to the CPU, or ignore it, and each
> bridge can decide to turn master aborts on reads into all 1s.
> We do have support some SoCs in Linux that trigger a CPU exception,
> but we tend to deal with those with an ugly hack that just ignores
> all exceptions from the CPU. Most host bridges fortunately behave
> like an x86 PC though, and do not trigger an exception here.

There's apparently a bit in the PCI spec that reads:
The host bus bridge, in PC compatible systems, must return all
1's on a read transaction and
discard data on a write transaction when terminated with Master-Abort.

which obviously applies only to "PC compatible systems".

> In the PCIe 4.0 specification, I found that the behavior is configurable
> at the root port, using the 'RP PIO Exception Register' at offset 0x1c
> in the DPC Extended Capability. This register defaults to '0', meaning
> that reads from an unknown port that generate a 'Unsupported Request
> Completion' get turned into all 1s. If the firmware or OS enables it,
> this can be turned into an AER log event, generate an interrupt or
> a CPU exception.
>
> Linux has a driver for DPC, which apparently configures it to
> cause an interrupt to log the event, but it does not hook up the
> CPU exception handler to this. I don't see an implementation of DPC
> in qemu, which I take as an indication that it should use the
> default behavior and cause neither an interrupt nor a CPU exception.

Hmm, maybe. We should probably also implement -1/discard just because
we're not intending to have 'surprising' behaviour.

TBH I'm having difficulty seeing why the kernel should be doing
this at all, though. The device tree tells you you have a PCI
controller; PCI supports enumeration of devices; you know exactly
where everything is mapped because the BARs tell you that.
I don't see anything that justifies the kernel in randomly
dereferencing areas of the IO or memory windows where it hasn't
mapped anything. You shouldn't be probing for legacy ISA-port
devices unless you're on a system which might actually have them
(eg an x86 PC).

thanks
-- PMM