Re: [PATCH] m68k: coldfire: Prevent spurious interrupts when masking IMR

From: Jean-Michel Hautbois
Date: Wed Feb 05 2025 - 06:27:15 EST


Hi Geert,

On 05/02/2025 09:14, Geert Uytterhoeven wrote:
Hi Jean-Michel,

On Wed, 5 Feb 2025 at 08:07, Jean-Michel Hautbois
<jeanmichel.hautbois@xxxxxxxxxx> wrote:
On 04/02/2025 20:27, Geert Uytterhoeven wrote:
On Tue, 4 Feb 2025 at 19:38, Jean-Michel Hautbois
<jeanmichel.hautbois@xxxxxxxxxx> wrote:
The ColdFire interrupt controller can generate spurious interrupts if an
interrupt source is masked in the IMR while the CPU interrupt priority
mask (SR[I]) is set lower than the interrupt level.

The reference manual states:

To avoid this situation for interrupts sources with levels 1-6, first
write a higher level interrupt mask to the status register, before
setting the mask in the IMR or the module’s interrupt mask register.
After the mask is set, return the interrupt mask in the status register
to its previous value.

It can be tested like this:
- Prepare a iperf3 server on the coldfire target (iperf3 -s -D)
- Start a high priority cyclictest:
cyclictest --secaligned -m -p 99 -i 2500 -q
- Start iperf3 -c $COLDFIRE_IP -t 0

After a few seconds the dmesg may display:
[ 84.784301] irq 24, desc: dbc502da, depth: 1, count: 0, unhandled: 0
[ 84.784455] ->handle_irq(): 0ba0aca3, handle_bad_irq+0x0/0x1e0
[ 84.784610] ->irq_data.chip(): c6779d4f, 0x41652544
[ 84.784719] ->action(): 00000000
[ 84.784770] unexpected IRQ trap at vector 18

With this patch, I never saw it in a few hours testing.

Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@xxxxxxxxxx>

Thanks for your patch!

--- a/arch/m68k/coldfire/intc-simr.c
+++ b/arch/m68k/coldfire/intc-simr.c
@@ -58,6 +58,14 @@ static inline unsigned int irq2ebit(unsigned int irq)

#endif

+static inline void intc_irq_setlevel(unsigned long level)
+{
+ asm volatile ("move.w %0,%%sr"
+ : /* no outputs */
+ : "d" (0x2000 | ((level) << 8))
+ : "memory");
+}
+
/*
* There maybe one, two or three interrupt control units, each has 64
* interrupts. If there is no second or third unit then MCFINTC1_* or
@@ -67,13 +75,17 @@ static inline unsigned int irq2ebit(unsigned int irq)
static void intc_irq_mask(struct irq_data *d)
{
unsigned int irq = d->irq - MCFINT_VECBASE;
+ unsigned long flags = arch_local_save_flags();

+ intc_irq_setlevel(7);

Can't all of the above just be replaced by

unsigned long flags = arch_local_irq_save();

The only change is the Supervisor bit in SR which is not changed in
arch_local_irq_disable() while it is forced to 1 in my function (setting
it to 0x2700 AFAICT).

But I can confirm I couldn't see the issue with this code, while using
the existing arch_local_irq_save() it still appears (less frequently
than without it at all, but still).

Any suggestion :-) ?

There are other differences: your version clears all other bits, incl.
condition codes and master/interrupt state.

Can you save the flags above in a global, and print it in the
unexpected IRQ handler, to see which other bits are set when
it happens?

An interesting side effect is... that only saving the flags makes it *a lot* harder to reproduce -_-.
Which is coherent with a race condition though :p.

Each time I got the message, the flags saved where 0x2711.

JM

Gr{oetje,eeting}s,

Geert