Re: [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races

From: Benjamin Herrenschmidt
Date: Wed Aug 15 2018 - 19:39:56 EST

Next message: Stephen Rothwell: "Re: linux-next: manual merge of the net-next tree with the rdma tree"
Previous message: Tuomas Tynkkynen: "AArch64 boot failure on Hikey960 on latest master after "arm64: insn: Don't fallback on nosync path for general insn patching""
In reply to: Guenter Roeck: "Re: [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2018-08-15 at 15:40 -0700, Guenter Roeck wrote:
> On Thu, Aug 16, 2018 at 07:50:13AM +1000, Benjamin Herrenschmidt wrote:
> > (Resent with lkml on copy)
> >
> > [Note: This isn't meant to be merged, it need splitting at the very
> > least, see below]
> >
> > This is something I cooked up quickly today to test if that would fix
> > my problems with large number of switch and NVME devices on POWER.
> >
>
> Is that a problem that can be reproduced with a qemu setup ?

With difficulty... mt-tcg might help, but you need a rather large
systems to reproduce it.

My repro-case is a 2 socket POWER9 system (about 40 cores off the top
of my mind, so 160 threads) with 72 NVME devices underneath a tree of
switches (I don't have the system at hand today to check how many).

It's possible to observe it I suppose on a smaller system (in theory a
single bridge with 2 devices is enough) but in practice the timing is
extremely hard to hit.

You need a combination of:

- The bridges come up disabled (which is the case when Linux does the
resource assignment, such as on POWER but not on x86 unless it's
hotplug)

- The nvme devices try to enable them simultaneously

Also the resulting error is a UR, I don't know how well qemu models
that.

On the above system, I get usually *one* device failing due to the race
out of 72, and not on every boot.

However, the bug is known (see Bjorn's reply to the other thread) "Re:
PCIe enable device races (Was: [PATCH v3] PCI: Data corruption
happening due to race condition)" on linux-pci, so I'm not the only one
with a repro-case around.

Cheers,
Ben.

Next message: Stephen Rothwell: "Re: linux-next: manual merge of the net-next tree with the rdma tree"
Previous message: Tuomas Tynkkynen: "AArch64 boot failure on Hikey960 on latest master after "arm64: insn: Don't fallback on nosync path for general insn patching""
In reply to: Guenter Roeck: "Re: [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]