Re: [PATCH 0/3] soc/tegra: Prevent the PMC driver from corrupting interrupt routing

From: Thierry Reding
Date: Mon Oct 05 2020 - 07:22:24 EST


On Mon, Oct 05, 2020 at 12:14:40PM +0100, Marc Zyngier wrote:
> Jon recently reported that one of the Tegra systems (Jetson TX2, aka
> tegra186) stopped booting with the introduction of the "IPI as IRQs"
> series. After a few weeks of head scratching and complete puzzlement,
> I obtained a board and started looking at what was happening.
>
> The interrupt hierarchy looks like this:
>
> [DEVICE] -A-> [PMC] -B-> [GIC]
>
> which seems simple enough. However, not all the devices attached to
> the PMC follow this hierarchy, and in some cases, the 'B' link isn't
> present in the HW. In other cases, neither 'A' nor 'B' are present.
> And yet the PMC driver creates such linkages using random hwirq values
> for the non-existent links, potentially overriding existing mappings
> in the process. "What could possibly go wrong?"

Yes, that would've been my fault. It seemed like the right thing to do
at the time, but the way you describe it makes it obvious that it was
not. I can't say I understand why this would've worked prior to the
rework that made this surface, though.

> It turns out that for the 'B' link, the PMC driver uses hwirq 0, which
> is SGI0 for the GIC, and used as the rescheduling IPI. Obviously, this
> doesn't go very well, nor very far, as the IPI gets routed to random
> drivers. Also, as the handling flow has been overridden, this
> interrupt never gets deactivated and can't fire anymore. Yes, this is
> bad.
>
> The 'A' link is less problematic, but the hwirq value is still out of
> the irqdomain range, and gets remapped every time a new 'A'-less
> driver comes up.
>
> Instead, let's trim the unused hierarchy levels as needed. This
> requires some checks in the upper levels of the hierarchy as we now
> have optional levels, but this looks a lot saner than what we
> currently have. With this, tegra186 is back booting on -next.
>
> I haven't tested any wake-up stuff, nor any other nvidia system (this
> is the only one I have). If people agree to these changes, I can take
> them via the irqchip tree so that they make it into the next merge
> window.

Yeah, it sounds like this needs to go in ideally before the rework that
caused this to surface in order to preserve bisectibility. But if it
goes in afterwards that's probably fine as well.

Let Jon and myself do a bit of testing with this to verify that the wake
up paths are still working.

Thierry

Attachment: signature.asc
Description: PGP signature