RE: Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers

From: Song Bao Hua (Barry Song)
Date: Thu Feb 11 2021 - 06:34:00 EST


> >
> > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> >
> > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > >
> > > > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > > > >
> > > > > > > > There is no warning from m68k builds. That's because
> > > > > > > > arch_irqs_disabled() returns true when the IPL is non-zero.
> > > > > > >
> > > > > > > So for m68k, the case is arch_irqs_disabled() is true, but
> > > > > > > interrupts can still come?
> > > > > > >
> > > > > > > Then it seems it is very confusing. If prioritized interrupts
> > > > > > > can still come while arch_irqs_disabled() is true,
> > > > > >
> > > > > > Yes, on m68k CPUs, an IRQ having a priority level higher than the
> > > > > > present priority mask will get serviced.
> > > > > >
> > > > > > Non-Maskable Interrupt (NMI) is not subject to this rule and gets
> > > > > > serviced regardless.
> > > > > >
> > > > > > > how could spin_lock_irqsave() block the prioritized interrupts?
> > > > > >
> > > > > > It raises the the mask level to 7. Again, please see
> > > > > > arch/m68k/include/asm/irqflags.h
> > > > >
> > > > > Hi Finn,
> > > > > Thanks for your explanation again.
> > > > >
> > > > > TBH, that is why m68k is so confusing. irqs_disabled() on m68k
> > > > > should just reflect the status of all interrupts have been disabled
> > > > > except NMI.
> > > > >
> > > > > irqs_disabled() should be consistent with the calling of APIs such
> > > > > as local_irq_disable, local_irq_save, spin_lock_irqsave etc.
> > > > >
> > > >
> > > > When irqs_disabled() returns true, we cannot infer that
> > > > arch_local_irq_disable() was called. But I have not yet found driver
> > > > code or core kernel code attempting that inference.
> > > >
> > > > > >
> > > > > > > Isn't arch_irqs_disabled() a status reflection of irq disable
> > > > > > > API?
> > > > > > >
> > > > > >
> > > > > > Why not?
> > > > >
> > > > > If so, arch_irqs_disabled() should mean all interrupts have been
> > > > > masked except NMI as NMI is unmaskable.
> > > > >
> > > >
> > > > Can you support that claim with a reference to core kernel code or
> > > > documentation? (If some arch code agrees with you, that's neither here
> > > > nor there.)
> > >
> > > I think those links I share you have supported this. Just you don't
> > > believe :-)
> > >
> >
> > Your links show that the distinction between fast and slow handlers was
> > removed. Your links don't support your claim that "arch_irqs_disabled()
> > should mean all interrupts have been masked". Where is the code that makes
> > that inference? Where is the documentation that supports your claim?
>
> (1)
> https://lwn.net/Articles/380931/
> Looking at all these worries, one might well wonder if a system which *disabled
> interrupts for all handlers* would function well at all. So it is interesting
> to note one thing: any system which has the lockdep locking checker enabled
> has been running all handlers that way for some years now. Many developers
> and testers run lockdep-enabled kernels, and they are available for some of
> the more adventurous distributions (Rawhide, for example) as well. So we
> have quite a bit of test coverage for this mode of operation already.
>
> (2)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=b738a50a
>
> "We run all handlers *with interrupts disabled* and expect them not to
> enable them. Warn when we catch one who does."
>
> (3)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=e58aa3d2d0cc
> genirq: Run irq handlers *with interrupts disabled*
>
> Running interrupt handlers with interrupts enabled can cause stack
> overflows. That has been observed with multiqueue NICs delivering all
> their interrupts to a single core. We might band aid that somehow by
> checking the interrupt stacks, but the real safe fix is to *run the irq
> handlers with interrupts disabled*.
>
>
> All these documents say we are running irq handler with interrupts
> disabled. but it seems you think high-prio interrupts don't belong
> to "interrupts" in those documents :-)
>
> that is why we can't get agreement. I think "interrupts" mean
> all except NMI in these documents, but you insist high-prio IRQ
> is an exception.
>
> >
> > > >
> > > > > >
> > > > > > Are all interrupts (including NMI) masked whenever
> > > > > > arch_irqs_disabled() returns true on your platforms?
> > > > >
> > > > > On my platform, once irqs_disabled() is true, all interrupts are
> > > > > masked except NMI. NMI just ignore spin_lock_irqsave or
> > > > > local_irq_disable.
> > > > >
> > > > > On ARM64, we also have high-priority interrupts, but they are
> > > > > running as PESUDO_NMI:
> > > > > https://lwn.net/Articles/755906/
> > > > >
> > > >
> > > > A glance at the ARM GIC specification suggests that your hardware
> > > > works much like 68000 hardware.
> > > >
> > > > When enabled, a CPU interface takes the highest priority pending
> > > > interrupt for its connected processor and determines whether the
> > > > interrupt has sufficient priority for it to signal the interrupt
> > > > request to the processor. [...]
> > > >
> > > > When the processor acknowledges the interrupt at the CPU interface,
> > > > the Distributor changes the status of the interrupt from pending to
> > > > either active, or active and pending. At this point the CPU
> > > > interface can signal another interrupt to the processor, to preempt
> > > > interrupts that are active on the processor. If there is no pending
> > > > interrupt with sufficient priority for signaling to the processor,
> > > > the interface deasserts the interrupt request signal to the
> > > > processor.
> > > >
> > > > https://developer.arm.com/documentation/ihi0048/b/
> > > >
> > > > Have you considered that Linux/arm might benefit if it could fully
> > > > exploit hardware features already available, such as the interrupt
> > > > priority masking feature in the GIC in existing arm systems?
> > >
> > > I guess no:-) there are only two levels: IRQ and NMI. Injecting a
> > > high-prio IRQ level between them makes no sense.
> > >
> > > To me, arm64's design is quite clear and has no any confusion.
> > >
> >
> > Are you saying that the ARM64 hardware design is confusing because it
> > implements a priority mask, and that's why you had to simplify it with a
> > pseudo-nmi scheme in software?
>
> No, I was not saying this. I think both m68k and arm64 have good hardware
> design. Just Linux's implementation is running irq-handlers with interrupts
> disabled. So ARM64's pseudo-nmi is adapted to Linux better.
>
> >
> > > >
> > > > > On m68k, it seems you mean:
> > > > > irq_disabled() is true, but high-priority interrupts can still come;
> > > > > local_irq_disable() can disable high-priority interrupts, and at that
> > > > > time, irq_disabled() is also true.
> > > > >
> > > > > TBH, this is wrong and confusing on m68k.
> > > > >
> > > >
> > > > Like you, I was surprised when I learned about it. But that doesn't mean
> > > > it's wrong. The fact that it works should tell you something.
> > > >
> > >
> > > The fact is that m68k lets arch_irq_disabled() return true to pretend
> > > all IRQs are disabled while high-priority IRQ is still open, thus "pass"
> > > all sanitizing check in genirq and kernel core.
> > >
> >
> > The fact is that m68k has arch_irq_disabled() return false when all IRQs
> > are enabled. So there is no bug.
>
> But it has arch_irq_disabled() return true while some interrupts(not NMI)
> are still open.
>
> >
> > > > Things could always be made simpler. But discarding features isn't
> > > > necessarily an improvement.
> > >
> > > This feature could be used by calling local_irq_enable_in_hardirq() in
> > > those IRQ handlers who hope high-priority interrupts to preempt it for a
> > > while.
> > >
> >
> > So, if one handler is sensitive to interrupt latency, all other handlers
> > should be modified? I don't think that's workable.
>
> I think we just enable preempt_rt or force threaded_irq, and then improve
> the priority of the irq thread who is sensitive to latency. No need to
> touch all threads.
>
> I also understand your point, we let one high-prio interrupt preempt
> low priority interrupt, then we don't need to change the whole system.
> But I think Linux prefers the method of threaded_irq or preempt_rt
> for this kind of problems.
>
> >
> > In anycase, what you're describing is a completely different nested
> > interrupt scheme that would defeat the priority level mechanism that the
> > hardware provides us with.
>
> Yes. Indeed.
>
> >
> > > It shouldn't hide somewhere and make confusion.
> > >
> >
> > The problem is hiding so well that no-one has found it! I say it doesn't
> > exist.
>
> Long long ago(before 2.6.38), we had a kernel supporting IRQF_DISABLED and
> nested interrupts were widely supported, but system also ran well in most
> cases. That means nested interrupts don't really matter in most cases.
> That is why m68k is also running well even though it is still nesting.
>
> >
> > > On the other hand, those who care about realtime should use threaded IRQ
> > > and let IRQ threads preempt each other.
> > >
> >
> > Yes. And those threads also have priority levels.
>
> Finn, I am not a m68k guy, would you help check if this could activate a
> warning on m68k. maybe we can discuss this question in genirq maillist from
> this warning if you are happy. Thanks very much.
>
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index 7c9d6a2d7e90..b8ca27555c76 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
> */
> #define __irq_enter() \
> do { \
> + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested
> interrupts\n"); \
> preempt_count_add(HARDIRQ_OFFSET); \
> lockdep_hardirq_enter(); \
> account_hardirq_enter(current); \
> @@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
> */
> #define __irq_enter_raw() \
> do { \
> + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested
> interrupts\n"); \
> preempt_count_add(HARDIRQ_OFFSET); \
> lockdep_hardirq_enter(); \
> } while (0)
>

I am not a m68k guy, here I can show you some code in cortex-m,
which supports full interrupt priority preemption in hardware,
but its arch code just disables it to satisfy Linux:

arch/arm/kernel/entry-header.S

.macro v7m_exception_entry
...

@ Linux expects to have irqs off. Do it here before taking stack space
cpsid i

arch/arm/kernel/entry-v7m.S

__irq_entry:
v7m_exception_entry

@
@ Invoke the IRQ handler
@
...
@ routine called with r0 = irq number, r1 = struct pt_regs *
bl nvic_handle_irq

pop {lr}


Another example in arch/arc/kernel/entry-arcv2.S:

ENTRY(handle_interrupt)

INTERRUPT_PROLOGUE

# irq control APIs local_irq_save/restore/disable/enable fiddle with
# global interrupt enable bits in STATUS32 (.IE for 1 prio, .E[] for 2 prio)
# However a taken interrupt doesn't clear these bits. Thus irqs_disabled()
# query in hard ISR path would return false (since .IE is set) which would
# trips genirq interrupt handling asserts.
#
# So do a "soft" disable of interrutps here.
#
# Note this disable is only for consistent book-keeping as further interrupts
# will be disabled anyways even w/o this. Hardware tracks active interrupts
# seperately in AUX_IRQ_ACT.active and will not take new interrupts
# unless this one returns (or higher prio becomes pending in 2-prio scheme)

IRQ_DISABLE

; icause is banked: one per priority level
; so a higher prio interrupt taken here won't clobber prev prio icause
lr r0, [ICAUSE]
mov blink, ret_from_exception

b.d arch_do_IRQ
mov r1, sp


Actually in m68k, I also saw its IRQ entry disabled interrupts by
' move #0x2700,%sr /* disable intrs */'

arch/m68k/include/asm/entry.h:

.macro SAVE_ALL_SYS
move #0x2700,%sr /* disable intrs */
btst #5,%sp@(2) /* from user? */
bnes 6f /* no, skip */
movel %sp,sw_usp /* save user sp */
...

.macro SAVE_ALL_INT
SAVE_ALL_SYS
moveq #-1,%d0 /* not system call entry */
movel %d0,%sp@(PT_OFF_ORIG_D0)
.endm

arch/m68k/kernel/entry.S:

/* This is the main interrupt handler for autovector interrupts */

ENTRY(auto_inthandler)
SAVE_ALL_INT
GET_CURRENT(%d0)
| put exception # in d0
bfextu %sp@(PT_OFF_FORMATVEC){#4,#10},%d0
subw #VEC_SPUR,%d0

movel %sp,%sp@-
movel %d0,%sp@- | put vector # on stack
auto_irqhandler_fixup = . + 2
jsr do_IRQ | process the IRQ
addql #8,%sp | pop parameters off stack
jra ret_from_exception

So my question is that " move #0x2700,%sr" is actually disabling
all interrupts? And is m68k actually running irq handlers
with interrupts disabled?

Best Regards
Barry