Re: [PATCH] arm64: PCI: Enable SMC conduit

From: Marcin Wojtas
Date: Thu Mar 25 2021 - 16:45:58 EST


Hi,


czw., 25 mar 2021 o 14:19 Lorenzo Pieralisi
<lorenzo.pieralisi@xxxxxxx> napisał(a):
>
> On Tue, Jan 26, 2021 at 10:53:51PM +0000, Will Deacon wrote:
> > On Tue, Jan 26, 2021 at 11:08:31AM -0600, Vikram Sethi wrote:
> > > On 1/22/2021 1:48 PM, Will Deacon wrote:
> > > > On Fri, Jan 08, 2021 at 10:32:16AM +0000, Lorenzo Pieralisi wrote:
> > > >> On Thu, Jan 07, 2021 at 04:05:48PM -0500, Jon Masters wrote:
> > > >>> On 1/7/21 1:14 PM, Will Deacon wrote:
> > > >>>> On Mon, Jan 04, 2021 at 10:57:35PM -0600, Jeremy Linton wrote:
> > > >>>>> Given that most arm64 platform's PCI implementations needs quirks
> > > >>>>> to deal with problematic config accesses, this is a good place to
> > > >>>>> apply a firmware abstraction. The ARM PCI SMMCCC spec details a
> > > >>>>> standard SMC conduit designed to provide a simple PCI config
> > > >>>>> accessor. This specification enhances the existing ACPI/PCI
> > > >>>>> abstraction and expects power, config, etc functionality is handled
> > > >>>>> by the platform. It also is very explicit that the resulting config
> > > >>>>> space registers must behave as is specified by the pci specification.
> > > >>>>>
> > > >>>>> Lets hook the normal ACPI/PCI config path, and when we detect
> > > >>>>> missing MADT data, attempt to probe the SMC conduit. If the conduit
> > > >>>>> exists and responds for the requested segment number (provided by the
> > > >>>>> ACPI namespace) attach a custom pci_ecam_ops which redirects
> > > >>>>> all config read/write requests to the firmware.
> > > >>>>>
> > > >>>>> This patch is based on the Arm PCI Config space access document @
> > > >>>>> https://developer.arm.com/documentation/den0115/latest
> > > >>>> Why does firmware need to be involved with this at all? Can't we just
> > > >>>> quirk Linux when these broken designs show up in production? We'll need
> > > >>>> to modify Linux _anyway_ when the firmware interface isn't implemented
> > > >>>> correctly...
> > > >>> I agree with Will on this. I think we want to find a way to address some
> > > >>> of the non-compliance concerns through quirks in Linux. However...
> > > >> I understand the concern and if you are asking me if this can be fixed
> > > >> in Linux it obviously can. The point is, at what cost for SW and
> > > >> maintenance - in Linux and other OSes, I think Jeremy summed it up
> > > >> pretty well:
> > > >>
> > > >> https://lore.kernel.org/linux-pci/61558f73-9ac8-69fe-34c1-2074dec5f18a@xxxxxxx
> > > >>
> > > >> The issue here is that what we are asked to support on ARM64 ACPI is a
> > > >> moving target and the target carries PCI with it.
> > > >>
> > > >> This potentially means that all drivers in:
> > > >>
> > > >> drivers/pci/controller
> > > >>
> > > >> may require an MCFG quirk and to implement it we may have to:
> > > >>
> > > >> - Define new ACPI bindings (that may need AML and that's already a
> > > >> showstopper for some OSes)
> > > >> - Require to manage clocks in the kernel (see link-up checks)
> > > >> - Handle PCI config space faults in the kernel
> > > >>
> > > >> Do we really want to do that ? I don't think so. Therefore we need
> > > >> to have a policy to define what constitutes a "reasonable" quirk and
> > > >> that's not objective I am afraid, however we slice it (there is no
> > > >> such a thing as eg 90% ECAM).
> > > > Without a doubt, I would much prefer to see these quirks and workarounds
> > > > in Linux than hidden behind a firmware interface. Every single time.
> > >
> > > In that case, can you please comment on/apply Tegra194 ECAM quirk that was rejected
> > >
> > > a year ago, and was the reason we worked with Samer/ARM to define this common
> > >
> > > mechanism?
> > >
> > > https://lkml.org/lkml/2020/1/3/395
> > >
> > > The T194 ECAM is from widely used Root Port IP from a IP vendor. That is one reason so many
> > >
> > > *existing* SOCs have ECAM quirks. ARM is only now working with the Root port IP vendors
> > >
> > > to test ECAM, MSI etc, but the reality is there were deficiencies in industry IP that is widely
> > >
> > > used. If this common quirk is not the way to go, then please apply the T194 specific quirk which was
> > >
> > > NAK'd a year ago, or suggest how to improve that quirk.
> > >
> > > The ECAM issue has been fixed on future Tegra chips and is validated preSilicon with BSA
> > >
> > > tests, so it is not going to be a recurrent issue for us.
> >
> > (aside: please fix your mail client not to add all these blank lines)
> >
> > Personally, if a hundred lines of self-contained quirk code is all
> > that is needed to get your legacy IP running, then I would say we
> > should merge it. But I don't maintain the PCI subsystem, and I trust
> > Bjorn and Lorenzo's judgement as to what is the right thing to do when
> > it concerns that code. After all, they're the ones who end up having
> > to look after this stuff long after the hardware companies have
> > stopped caring.
>
> A discussion was held between me, Will Deacon, Bjorn Helgaas and Jon
> Masters to agree on a proposed solution for this matter, a summary of the
> outcome below:
>
> - The PCI SMC conduit and related specifications are seen as firmware
> kludge to a long-standing HW compliance issue. The SMC interface does
> not encourage Arm partners to fix their IPs and its only purpose
> consists in papering over HW issues that should have been fixed by
> now; were the PCI SMC conduit introduced at arm64 ACPI inception as
> part of the standardization effort the matter would have been different
> but introducing it now brings about more shortcomings than benefits on
> balance, especially if MCFG quirks can be controlled and monitored (and
> they will).
>
> The end-goal is that hardware must be ECAM compliant. An SMC-based
> solution runs counter to that desire by removing the incentive for ECAM
> compliance.
>
> In sum, the SMC conduit solution was deemed to be deficient in these
> respects:
>
> * Removes incentive to build hardware with compliant ECAM
> * Moves quirk code into firmware where it can't sensibly
> be maintained or updated
> * Future of the SMC conduit is unclear and has no enforceable
> phase-out plan
>
> It was decided that the PCI SMC conduit enablement patches will not be
> merged for these specific reasons.
>
> - It is not clear why ACPI enablement is requested for platforms that are
> clearly not compliant with Arm SBSA/SBBR guidelines; there is no
> interest from distros in enabling ACPI bootstrap on non-SBSA compliant
> HW, devicetree firmware can be used to bootstrap non-compliant platforms.
> - We agreed that Linux will rely on MCFG quirks to enable PCI on ACPI
> arm64 systems if the relevant HW is not ECAM compliant (and ACPI
> enablement is requested); non-ECAM compliance must be classified as a HW
> defect and filed in the Linux kernel as an erratum in (or equivalent
> mechanism TBD):
>
> Documentation/arm64/acpi-ecam-quirks.rst
>
> Entries will contain an expected lifetime for the SoC in question and
> a contact point. When an entry expires, a patch to remove the related
> MCFG quirk will be proposed and action taken accordingly (either the
> quirk is removed since support is no longer required or the entry is
> updated). Details behind the specific mechanism to follow on public
> mailing lists.
>
> - MCFG quirks will be reviewed by PCI maintainers and acceptance will be
> granted or refused on a case by case basis; the aim is supporting HW
> where quirks are self-contained and don't require FW or specifications
> updates.
>
> In order to request a MCFG quirk acceptance a relevant errata entry
> should be filed in the related Linux kernel documentation errata file.
> This will help detect whether non-ECAM HW bugs that were granted an
> MCFG quirk are actually fixed in subsequent SoCs.
>
> - As a rule of thumb (that will be drafted in non-ECAM errata guidelines),
> to be considered for upstream MCFG quirks must not rely on additional
> ACPI firmware bindings and AML code other than MCFG table and
> PNP0A08/PNP0A03 ACPI *existing* bindings.
>
> MCFG quirks suitability for upstream merge will be determined by
> PCI maintainers only.
>

Thank you for the efforts of keeping arm64 PCI+ACPI world clean. The
discussion and finally the last statement under this patch revived
some old memories and triggered thoughts I'd like to share.

We are close to the 4th anniversary of setting the MCFG quirk embargo.
The merged ones are mostly really nasty, but were lucky to jump on
that train back in the day. MacchiatoBin platform (and the entire
Marvell Armada 7k8k SoC family) was created before that time, but
missed it by only a couple of months with its firmware development. It
has a DWC IP with 3 lines of the required quirk (see DT variant for
the reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-host-generic.c?h=v5.12-rc4#n26)
but we had to politely accept "these are the rules, we will never
convince the vendors to properly adopt to the specs"-NACK.

This hurt badly the first candidate for arm64-PC-like platform, as
effectively blocked the GPU usage with ACPI. Same story with a real
candidate for such device (SolidRun Honeycomb) - similar DWC
controller and the same problems. We want people to use arm64
workstations outside of the passionate-developer-bubble, we want to
standardize (great SystemReady program!), but due to arbitrary
decisions we don't push it forward, least to say. Don't get me wrong,
I would love all HW to use proper IP and "just work" without hacks,
but this takes time and apparently is not that easy, so maybe an
option to mitigate the limitations with SW (to some extent and even
temporary) should be considered. This patch was a chance for that IMO,
without adding a burden of maintaining quirks.

Also I am not in a position to reach out to vendors and convince to
anything, but I read about this need 4 years ago and now I see that
there is a *plan* to do it. DWC is as broken as it was, with a lot new
platforms in the tree, but fully functional in ECAM mode only with
DT...

But I left the best to the end - below are 2 quirks merged despite the embargo:
Ampere: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/acpi/pci_mcfg.c?h=v5.12-rc4&id=877c1a5f79c6984bbe3f2924234c08e2f4f1acd5
Amazon (Annapurna):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/acpi/pci_mcfg.c?h=v5.12-rc4&id=4166bfe53093b687a0b1b22e5d943e143b8089b2
I must admit the second one rose my blood pressure and triggered this
email - it's a quirk for DWC, 1:1 to what was NACKed for Marvell
almost 2 years earlier.

So what we have after 4 years:
* Direct convincing of IP vendors still being a plan.
* Reverting the original approach towards MCFG quirks.
* Double-standards in action as displayed by 2 cases above.
I'm sorry for my bitter tone, but I think this time could and should
have been spent better - I doubt it managed to push us in any
significant way towards wide fully-standard compliant PCIE IP
adoption.

Best regards,
Marcin