Re: [PATCH v9 2/3] PCI: Add tango PCIe host bridge support

From: Ard Biesheuvel
Date: Mon Jul 03 2017 - 14:44:47 EST

On 3 July 2017 at 19:11, Russell King - ARM Linux <linux@xxxxxxxxxxxxxxx> wrote:
> On Mon, Jul 03, 2017 at 08:40:31AM -0500, Bjorn Helgaas wrote:
>> The problem is serializing vs. memory accesses, since they don't use
>> any wrappers. However, they are ioremapped(), so it's at least
>> conceivable that another solution would be to use VM to trap those
>> accesses. I'm not a VM person, so I don't know whether that's
>> feasible in Linux.
> Bjorn,
> You're forgetting that MMIO (iow, memory returned by ioremap()) must
> be accessed through the appropriate accessors, and must not be
> directly dereferenced in C. (We do have buggy drivers that do that
> but they are buggy, and in many cases are getting attention to fix
> that.)
> However, adding a spinlock into them is really not nice, because it
> adds extra overhead that's only necessary for rare cases like Sigma
> Designs - especially when you consider that these accessors are used
> for all MMIO accesses, not just PCI. It would effectively mean that
> we end up serialising all MMIO accesses throughout the kernel when
> Sigma Designs SoCs are enabled, destroying some of the SMP benefit.
> I don't think we can sanely use the MMU to trap those accesses either,
> that would mean sending IPIs to tell other CPUs to do something, and
> waiting for them to respond - which can deadlock if we're already in
> an IRQ-protected region (iirc, config accesses are made with IRQs
> off.)
> I don't think there's an easy solution to this problem - and I'm not
> sure that stop_machine() can be made to work in this path (which
> needs a process context). I have a suspicion that the Sigma Designs
> PCI implementation is just soo insane that it's never going to work
> reliably in a multi-SoC kernel without introducing severe performance
> issues for everyone else.

I suppose we could perhaps use per-cpu spinlocks? That would put the
complexity in the Sigma config space accessors, i.e., to take each
lock before proceeding with reprogramming the outbound window, and
other implementations wouldn't have to care. However, I do agree with
Russell that having this complexity in the first place is hard to
justify if the only implementation that requires it is a wacky design
that needs lots of other quirks to operate somewhat sanely to begin