Re: Serial: bug in 8250.c when handling PCI or other level triggers

From: Russell King
Date: Wed Dec 21 2005 - 17:14:24 EST


On Wed, Dec 21, 2005 at 10:33:37PM +0200, Meelis Roos wrote:
> >Hmm, possibly, but could you apply this patch and provide the resulting
> >messages please? It'll probably cause some character loss when it
> >decides to dump some debugging.
>
> Here is the full dmesg with it, from ICH2. First messages are from
> serial port initialisation and the last ones are from running minicom at
> 9600 as thje console for a cisco.

Urm, this is silly. Lets look at the first instance.

> lp0: ECP mode
> serial8250: too much work for irq4
> serial8250: port c0452c80(0)
> 0: jif=00000000 type=00 num=00 iir=00 lsr=00 => iir=00 lsr=00
> ...
> 29: jif=00000000 type=00 num=00 iir=00 lsr=00 => iir=00 lsr=00
> 30: jif=fffef262 type=00 num=00 iir=01 lsr=00 => iir=01 lsr=00

This is the first interrupt we apparantly recieved. We start with
end = NULL, l = i->head, and pass_counter starts off at zero (as
can be seen in the num line above.)

up = list_entry(l, struct uart_8250_port, list);

iir = serial_in(up, UART_IIR);

The IIR for the device reports 0x01 (IIR_NO_INT set) so no interrupt
was pending, so:

if (!(iir & UART_IIR_NO_INT)) {
} else if (end == NULL)
end = l;

we set our end marker to this port and move on to the next port.

l = l->next;

Since we only have one port open on this IRQ, l is equal to l->next
so this has no effect.

if (l == i->head && pass_counter++ > PASS_LIMIT) {

since l is still equal to i->head, we increment the pass counter.

} while (l != end);

We set end = l above. Given the comments above, that means that l
_is_ equal to end, so we should exit this loop.

> 31: jif=fffef262 type=00 num=01 iir=01 lsr=60 => iir=01 lsr=60

However, strangely we don't exit the loop, but we go around for
another go - since we can see num=1 here, which means pass_counter
is now '1'.

In addition, what's weirder is that iir is still saying NO_INT,
yet we seem to have taken a completely different path through
the code since the LSR value is now set. This is only set in the
other branch of the if statement, inside serial8250_handle_port().

> 32: jif=fffef262 type=00 num=02 iir=01 lsr=60 => iir=01 lsr=60

And we go again... and again...

> 62: jif=fffef262 type=00 num=20 iir=01 lsr=60 => iir=01 lsr=60
> 63: jif=fffef262 type=00 num=21 iir=01 lsr=60 => iir=01 lsr=60

until we hit the PASS_LIMIT and produce this dump.

The rest of the dumps show a very similar situation. To me, it
looks like your machine is _not_ doing what the C code is telling
it to do. Compiler bug maybe?

> serial8250: too much work for irq4
> serial8250: port c0452c80(0)
> 0: jif=fffef262 type=00 num=06 iir=01 lsr=60 => iir=01 lsr=60
> 1: jif=fffef262 type=00 num=07 iir=01 lsr=60 => iir=01 lsr=60
> 2: jif=fffef262 type=00 num=08 iir=01 lsr=60 => iir=01 lsr=60
> 3: jif=fffef262 type=00 num=09 iir=01 lsr=60 => iir=01 lsr=60
> 4: jif=fffef262 type=00 num=0a iir=01 lsr=60 => iir=01 lsr=60
> 5: jif=fffef262 type=00 num=0b iir=01 lsr=60 => iir=01 lsr=60
> 6: jif=fffef262 type=00 num=0c iir=01 lsr=60 => iir=01 lsr=60
> 7: jif=fffef262 type=00 num=0d iir=01 lsr=60 => iir=01 lsr=60
> 8: jif=fffef262 type=00 num=0e iir=01 lsr=60 => iir=01 lsr=60
> 9: jif=fffef262 type=00 num=0f iir=01 lsr=60 => iir=01 lsr=60
> 10: jif=fffef262 type=00 num=10 iir=01 lsr=60 => iir=01 lsr=60
> 11: jif=fffef262 type=00 num=11 iir=01 lsr=60 => iir=01 lsr=60
> 12: jif=fffef262 type=00 num=12 iir=01 lsr=60 => iir=01 lsr=60
> 13: jif=fffef262 type=00 num=13 iir=01 lsr=60 => iir=01 lsr=60
> 14: jif=fffef262 type=00 num=14 iir=01 lsr=60 => iir=01 lsr=60
> 15: jif=fffef262 type=00 num=15 iir=01 lsr=60 => iir=01 lsr=60
> 16: jif=fffef262 type=00 num=16 iir=01 lsr=60 => iir=01 lsr=60
> 17: jif=fffef262 type=00 num=17 iir=01 lsr=60 => iir=01 lsr=60
> 18: jif=fffef262 type=00 num=18 iir=01 lsr=60 => iir=01 lsr=60
> 19: jif=fffef262 type=00 num=19 iir=01 lsr=60 => iir=01 lsr=60
> 20: jif=fffef262 type=00 num=1a iir=01 lsr=60 => iir=01 lsr=60
> 21: jif=fffef262 type=00 num=1b iir=01 lsr=60 => iir=01 lsr=60
> 22: jif=fffef262 type=00 num=1c iir=01 lsr=60 => iir=01 lsr=60
> 23: jif=fffef262 type=00 num=1d iir=01 lsr=60 => iir=01 lsr=60
> 24: jif=fffef262 type=00 num=1e iir=01 lsr=60 => iir=01 lsr=60
> 25: jif=fffef262 type=00 num=1f iir=01 lsr=60 => iir=01 lsr=60
> 26: jif=fffef262 type=00 num=20 iir=01 lsr=60 => iir=01 lsr=60
> 27: jif=fffef262 type=00 num=21 iir=01 lsr=60 => iir=01 lsr=60
> 28: jif=fffef2c8 type=00 num=00 iir=c2 lsr=60 => iir=c1 lsr=60

Ah, a real interrupt for once - a transmit interrupt which we
service, and then...

> 29: jif=fffef2c8 type=00 num=01 iir=c1 lsr=00 => iir=c1 lsr=00

recheck the interrupt status and then exit as we should always
do. Why this is any different from the previous one I've no
idea... so maybe it can't be a compiler bug...

Unless the compiler is messing up the initialiser for:

struct list_head *l, *end = NULL;

Could you try placing a:

BUG_ON(end != NULL);

between:

spin_lock(&i->lock);

and:

l = i->head;

in serial8250_interrupt please?

If that doesn't trigger, then I'm all out of ideas.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/