RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers

From: Finn Thain
Date: Wed Feb 10 2021 - 20:12:47 EST


On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:

> > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> >
> > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > >
> > > > > > There is no warning from m68k builds. That's because
> > > > > > arch_irqs_disabled() returns true when the IPL is non-zero.
> > > > >
> > > > > So for m68k, the case is arch_irqs_disabled() is true, but
> > > > > interrupts can still come?
> > > > >
> > > > > Then it seems it is very confusing. If prioritized interrupts
> > > > > can still come while arch_irqs_disabled() is true,
> > > >
> > > > Yes, on m68k CPUs, an IRQ having a priority level higher than the
> > > > present priority mask will get serviced.
> > > >
> > > > Non-Maskable Interrupt (NMI) is not subject to this rule and gets
> > > > serviced regardless.
> > > >
> > > > > how could spin_lock_irqsave() block the prioritized interrupts?
> > > >
> > > > It raises the the mask level to 7. Again, please see
> > > > arch/m68k/include/asm/irqflags.h
> > >
> > > Hi Finn,
> > > Thanks for your explanation again.
> > >
> > > TBH, that is why m68k is so confusing. irqs_disabled() on m68k
> > > should just reflect the status of all interrupts have been disabled
> > > except NMI.
> > >
> > > irqs_disabled() should be consistent with the calling of APIs such
> > > as local_irq_disable, local_irq_save, spin_lock_irqsave etc.
> > >
> >
> > When irqs_disabled() returns true, we cannot infer that
> > arch_local_irq_disable() was called. But I have not yet found driver
> > code or core kernel code attempting that inference.
> >
> > > >
> > > > > Isn't arch_irqs_disabled() a status reflection of irq disable
> > > > > API?
> > > > >
> > > >
> > > > Why not?
> > >
> > > If so, arch_irqs_disabled() should mean all interrupts have been
> > > masked except NMI as NMI is unmaskable.
> > >
> >
> > Can you support that claim with a reference to core kernel code or
> > documentation? (If some arch code agrees with you, that's neither here
> > nor there.)
>
> I think those links I share you have supported this. Just you don't
> believe :-)
>

Your links show that the distinction between fast and slow handlers was
removed. Your links don't support your claim that "arch_irqs_disabled()
should mean all interrupts have been masked". Where is the code that makes
that inference? Where is the documentation that supports your claim?

> >
> > > >
> > > > Are all interrupts (including NMI) masked whenever
> > > > arch_irqs_disabled() returns true on your platforms?
> > >
> > > On my platform, once irqs_disabled() is true, all interrupts are
> > > masked except NMI. NMI just ignore spin_lock_irqsave or
> > > local_irq_disable.
> > >
> > > On ARM64, we also have high-priority interrupts, but they are
> > > running as PESUDO_NMI:
> > > https://lwn.net/Articles/755906/
> > >
> >
> > A glance at the ARM GIC specification suggests that your hardware
> > works much like 68000 hardware.
> >
> > When enabled, a CPU interface takes the highest priority pending
> > interrupt for its connected processor and determines whether the
> > interrupt has sufficient priority for it to signal the interrupt
> > request to the processor. [...]
> >
> > When the processor acknowledges the interrupt at the CPU interface,
> > the Distributor changes the status of the interrupt from pending to
> > either active, or active and pending. At this point the CPU
> > interface can signal another interrupt to the processor, to preempt
> > interrupts that are active on the processor. If there is no pending
> > interrupt with sufficient priority for signaling to the processor,
> > the interface deasserts the interrupt request signal to the
> > processor.
> >
> > https://developer.arm.com/documentation/ihi0048/b/
> >
> > Have you considered that Linux/arm might benefit if it could fully
> > exploit hardware features already available, such as the interrupt
> > priority masking feature in the GIC in existing arm systems?
>
> I guess no:-) there are only two levels: IRQ and NMI. Injecting a
> high-prio IRQ level between them makes no sense.
>
> To me, arm64's design is quite clear and has no any confusion.
>

Are you saying that the ARM64 hardware design is confusing because it
implements a priority mask, and that's why you had to simplify it with a
pseudo-nmi scheme in software?

> >
> > > On m68k, it seems you mean:
> > > irq_disabled() is true, but high-priority interrupts can still come;
> > > local_irq_disable() can disable high-priority interrupts, and at that
> > > time, irq_disabled() is also true.
> > >
> > > TBH, this is wrong and confusing on m68k.
> > >
> >
> > Like you, I was surprised when I learned about it. But that doesn't mean
> > it's wrong. The fact that it works should tell you something.
> >
>
> The fact is that m68k lets arch_irq_disabled() return true to pretend
> all IRQs are disabled while high-priority IRQ is still open, thus "pass"
> all sanitizing check in genirq and kernel core.
>

The fact is that m68k has arch_irq_disabled() return false when all IRQs
are enabled. So there is no bug.

> > Things could always be made simpler. But discarding features isn't
> > necessarily an improvement.
>
> This feature could be used by calling local_irq_enable_in_hardirq() in
> those IRQ handlers who hope high-priority interrupts to preempt it for a
> while.
>

So, if one handler is sensitive to interrupt latency, all other handlers
should be modified? I don't think that's workable.

In anycase, what you're describing is a completely different nested
interrupt scheme that would defeat the priority level mechanism that the
hardware provides us with.

> It shouldn't hide somewhere and make confusion.
>

The problem is hiding so well that no-one has found it! I say it doesn't
exist.

> On the other hand, those who care about realtime should use threaded IRQ
> and let IRQ threads preempt each other.
>

Yes. And those threads also have priority levels.

> Thanks
> Barry
>
>