Re: [PATCH v8 12/26] arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking

From: Dave Martin
Date: Tue Jan 08 2019 - 13:01:57 EST


On Tue, Jan 08, 2019 at 05:16:43PM +0000, Marc Zyngier wrote:
> On 08/01/2019 16:45, Dave Martin wrote:
> > On Tue, Jan 08, 2019 at 03:51:18PM +0000, Marc Zyngier wrote:
> >> On 08/01/2019 15:40, Dave Martin wrote:
> >>> On Tue, Jan 08, 2019 at 02:07:30PM +0000, Julien Thierry wrote:
> >>>> Instead disabling interrupts by setting the PSR.I bit, use a priority
> >>>> higher than the one used for interrupts to mask them via PMR.
> >>>>
> >>>> When using PMR to disable interrupts, the value of PMR will be used
> >>>> instead of PSR.[DAIF] for the irqflags.
> >>>>
> >>>> Signed-off-by: Julien Thierry <julien.thierry@xxxxxxx>
> >>>> Suggested-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> >>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> >>>> Cc: Will Deacon <will.deacon@xxxxxxx>
> >>>> Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> >>>> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> >>>> ---
> >>>> arch/arm64/include/asm/efi.h | 11 ++++
> >>>> arch/arm64/include/asm/irqflags.h | 123 +++++++++++++++++++++++++++++---------
> >>>> 2 files changed, 106 insertions(+), 28 deletions(-)
> >>>
> >>> [...]
> >>>
> >>>> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> >>>> index 24692ed..fa3b06f 100644
> >>>> --- a/arch/arm64/include/asm/irqflags.h
> >>>> +++ b/arch/arm64/include/asm/irqflags.h
> >>>> @@ -18,7 +18,9 @@
> >>>
> >>> [...]
> >>>
> >>>> static inline void arch_local_irq_enable(void)
> >>>> {
> >>>> - asm volatile(
> >>>> - "msr daifclr, #2 // arch_local_irq_enable"
> >>>> - :
> >>>> + unsigned long unmasked = GIC_PRIO_IRQON;
> >>>> +
> >>>> + asm volatile(ALTERNATIVE(
> >>>> + "msr daifclr, #2 // arch_local_irq_enable\n"
> >>>> + "nop",
> >>>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> >>>> + "dsb sy",
> >>>
> >>> I'm still not convinced these dsbs are needed.
> >>>
> >>> Without the dsb, we are probably not guaranteed to take a pending
> >>> interrupt _immediately_ on unmasking, but I'm not sure that's a
> >>> problem.
> >>>
> >>> What goes wrong if we omit them?
> >>
> >> Then the GIC doesn't know it can now deliver interrupts of a lower
> >> priority. Only a dsb can guarantee that the GIC's view of PMR will get
> >> updated.
> >>
> >> See 9.1.6 (Observability of the effects of accesses to the GIC
> >> registers), which states:
> >>
> >> <quote>
> >> Architectural execution of a DSB instruction guarantees that
> >> â The last value written to ICC_PMR_EL1 or GICC_PMR is observed by the
> >> associated Redistributor.
> >> </quote>
> >>
> >> So yes, DSB is required.
> >
> > But it says neither what is means for the PMR write to be "observed by
> > the redistributor", nor whether the DSB is required for the
> > redistributor to observe the write at all.
>
> Well, it seems pretty clear to me that if the redistributor doesn't
> observe the PMR value, it is unlikely to change its interpretation of
> it And conversely, the redistributor is allowed to sit pretty and not
> give you any interrupt until you are actually telling it that something
> has changed.
>
> I really think that for once, the spec is pretty unambiguous about what
> is required.

I think that there is some scope for clarification, but it sounds like
that doesn't impact this series.

> > (So, is an implementation
> > allowed to cached in the CPU interface indefinitely until forcibly
> > flushed to the redistributor by a DSB, and in any case can the write's
> > reaching the distributor in finite time or not have any effect that we
> > care about in this case?).
>
> Nothing in the spec says that the system register write will magically
> trickle down to the redistributor in the absence of a DSB.
>
> > My reason for querying this is that temporary local masking of classes
> > of interrupts seems an obvious use case for the PMR, and the DSB
> > requirement flies rather in the face of this.
>
> Are you implying that the GIC architecture should have any form of
> sanity and be useful for general purpose software? Think again! ;-)
>
> The PMR behavior you are describing only works in a single direction
> (from low to high priority), because the CPU interface has to perform
> some filtering. In the opposite direction, you need the big hammer.
>
> > Have we seen hardware where interrupts may stall forever upstream of the
> > CPU interface after a PMR write, until a dsb is executed by the CPU?
>
> Yes. You even have to have a DSB right after a read of IAR to avoid
> loosing interrupts. The short story is that there is hardly any
> synchronization between redistributor and CPU interface. Implementations
> are allowed a more closely coupled design, but that's not what the
> architecture mandates.

Well I guess that pretty much wraps it up!

If implementations require it, then we obviously need to have it.

> > Also, is it ever important in Linux that a pending interrupt be taken
> > immediately upon unmasking (and how do we know that said interrupt is
> > pending)? If not, we don't care precisely when such interrupts are
> > pended to the PE, just that such an interrupt cannot be taken before
> > the PMR write that unmasks it. It would be insane for the self-
> > synchronization of PMR writes to lack this guarantee (and a DSB after
> > the PMR write would do no good anyway in that case).
>
> RT folks are usually quite picky on when they see their interrupts
> firing. I can also imagine the following scenario:
>
> set_pmr(allow all interrupts)
> WFI
>
> where things stop rather abruptly if this is the only CPU in the system.

This is a rather special case (and this case is indeed handled specially
by this series). Here we don't need to expedite interrupt delivery,
but we want to make sure that the logic that will wake us back up knows
what it's supposed to be doing before we start powering things off.

However, I can see that for RT purposes explicit synchronisation on
interrupt unmasking may provided a more bounded interrupt blackout that
simply waiting for synchronisation to happen in the background (though
that might have better throughput). Anyway, this is academic since
we have to have the synchyronisation anyway.

I will consider myself corrected and stop trolling...

Cheers
---Dave