[PATCH] IRQ handling race and spurious IIR read in serial/8250.c

From: Ian Jackson
Date: Wed Feb 11 2009 - 11:08:42 EST


Anders Kaseorg writes ("Re: Serial console hangs with Linux 2.6.20 HVM guest"):
> Yes, I took Linux v2.6.20 on amd64, ran `make defconfig`, then ran `make
> menuconfig` and turned off CONFIG_HOTPLUG_CPU (Processor type and features
> ò Support for hot-pluggable CPUs).

Thanks. I think I have tracked down the bug. In
drivers/serial/8250.c in Linux there are two bugs:
1. UART_BUG_TXEN can be spuriously set, due to an IRQ race
2. The workaround then applied by the kernel is itself buggy

Anders: can you try two tests for me ? Firstly, in
serial8250_startup, delete the section which sets UART_BUG_TXEN (see
2nd patch below); I think this will fix the symptoms for you.
Secondly, in serial8250_start_tx delete the read from the IIR and the
relevant branch of the text (see 3rd patch below); I think this will
also in itself fix your symptoms. I haven't compiled either patch (so
you may find that eg I missed deleting some variables).


The bugs in detail (this discussion applies to 2.6.20 and also to
2.6.28.4):

1. The hunk of serial8250_startup I quote below attempts to discover
whether writing the IER re-asserts the THRI (transmit ready)
interrupt. However the spinlock that it has taken out,
port->lock, is not the one that the IRQ service routine takes
before reading the IIR (i->lock). As a result, on an SMP system
the generated interrupt races with the straight-line code in
serial8250_startup.

If serial8250_startup loses the race (perhaps because the system
is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
added to bugs. This is quite unlikely in a normal system but in
certain Xen configurations, particularly ones where there is CPU
pressure, we may lose the race every time.

It is not exactly clear to me how this ought to be resolved. One
possibility is that the UART_BUG_TXEN problem might be worked
around perfectly well by the new and very similar workaround
UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
removed.

2. UART_BUG_TXEN's workaround appears to be intended to be harmless.
However what it actually does is to read the IIR, thus clearing
any actual interrupt (including incidentally non-THRI), and then
only perform the intended servicing if the interrupt was _not_
asserted. That is, it breaks on any serial port with the bug.

As far as I can see there is not much use in UART_BUG_TXEN reading
IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
might be the first patch I enclose below (again, not compiled
or tested).

If UART_BUG_TXEN is retained something along these lines should be
done at the very least.

Ian.

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
in which case UART


Proposed initial band-aid fix (against 2.6.28.4):


Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

--- ../linux-2.6.28.4/drivers/serial/8250.c~ 2009-02-06 21:47:45.000000000 +0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c 2009-02-11 15:55:24.000000000 +0000
@@ -1257,14 +1257,12 @@
serial_out(up, UART_IER, up->ier);

if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
- iir = serial_in(up, UART_IIR) & 0x0f;
if ((up->port.type == PORT_RM9000) ?
- (lsr & UART_LSR_THRE &&
- (iir == UART_IIR_NO_INT || iir == UART_IIR_THRI)) :
- (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+ (lsr & UART_LSR_THRE) :
+ (lsr & UART_LSR_TEMT))
transmit_chars(up);
}
}


Anders - first patch to try (against 2.6.20):

--- drivers/serial/8250.c~ 2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c 2009-02-11 15:39:43.000000000 +0000
@@ -1645,25 +1645,6 @@

serial8250_set_mctrl(&up->port, up->port.mctrl);

- /*
- * Do a quick test to see if we receive an
- * interrupt when we enable the TX irq.
- */
- serial_outp(up, UART_IER, UART_IER_THRI);
- lsr = serial_in(up, UART_LSR);
- iir = serial_in(up, UART_IIR);
- serial_outp(up, UART_IER, 0);
-
- if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
- if (!(up->bugs & UART_BUG_TXEN)) {
- up->bugs |= UART_BUG_TXEN;
- pr_debug("ttyS%d - enabling bad tx status workarounds\n",
- port->line);
- }
- } else {
- up->bugs &= ~UART_BUG_TXEN;
- }
-
spin_unlock_irqrestore(&up->port.lock, flags);

/*


Anders - second patch to try (against 2.6.20):
Fix should be suitable for distribution IMO.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

--- drivers/serial/8250.c~ 2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c 2009-02-11 15:41:51.000000000 +0000
@@ -1136,10 +1136,9 @@
serial_out(up, UART_IER, up->ier);

if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
- iir = serial_in(up, UART_IIR);
- if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT)
+ if (lsr & UART_LSR_TEMT)
transmit_chars(up);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/