Re: Updated Linux 2.4 Status/TODO List (from the ALS show)

From: David Woodhouse (dwmw2@infradead.org)
Date: Thu Oct 12 2000 - 17:15:10 EST


> * USB: system hang with USB audio driver {CRITICAL} (David
> Woodhouse, Randy Dunlap, Narayan Desai)

This is necessary but not sufficient:

Index: drivers/usb/audio.c
===================================================================
RCS file: /net/passion/inst/cvs/linux/drivers/usb/audio.c,v
retrieving revision 1.1.2.31
diff -u -r1.1.2.31 audio.c
--- drivers/usb/audio.c 2000/09/07 08:26:12 1.1.2.31
+++ drivers/usb/audio.c 2000/10/12 21:33:22
@@ -1007,8 +1007,10 @@
                 }
                 spin_lock_irqsave(&as->lock, flags);
         }
- if (u->dma.count >= u->dma.dmasize && !u->dma.mapped)
+ if (u->dma.count >= u->dma.dmasize && !u->dma.mapped) {
+ spin_unlock_irqrestore(&as->lock, flags);
                 return 0;
+ }
         u->flags |= FLG_RUNNING;
         if (!(u->flags & FLG_URB0RUNNING)) {
                 urb = &u->durb[0].urb;
@@ -1368,8 +1370,10 @@
                 }
                 spin_lock_irqsave(&as->lock, flags);
         }
- if (u->dma.count <= 0 && !u->dma.mapped)
+ if (u->dma.count <= 0 && !u->dma.mapped) {
+ spin_unlock_irqrestore(&as->lock, flags);
                 return 0;
+ }
                u->flags |= FLG_RUNNING;
         if (!(u->flags & FLG_URB0RUNNING)) {
                 urb = &u->durb[0].urb;

That fixes failure mode #1, in which the NMI watchdog gets triggered and
all subsequent attempts to open /dev/audio just block.

Unfortunately, it doesn't affect failure mode #2, in which the machine
just dies completely.

as->lock isn't locked when this happens - the last time
it was altered was at the end of usbout_start (line 1435).

wait_on_irq, CPU 0:
irq: 1 [ 0 1 ]
bh: 0 [ 1 0 ]
       850f00e4 000000a0 0fe0458b b70f00b7 0cec83c0 5c4ee850 c483fffd 85896610
       ffffe7a2 a2858b66 0fffffe7 6850c0b7 080e83c2 006a016a 00d373e8 10c48300
Call Trace: [<ec83e045>] [<e0458b0c>] [<e850c0b7>] [<fffef850>] [<e8080e45>] [<eb10c483>] [<f6892aeb>]
       [<e7cc8589>] [<ffe7cc85>] [<e8006a02>] [<ffe79c85>] [<fd5b8fe8>] [<e79a858b>] [<e798858b>] [<fb808ae4>]

                < ... LOTS of similarly unbelievable addresses ... >

       [<e0858910>] [<fffbe0bd>] [<ff0cec83>] [<e9000000>] [<ff016a08>] [<fbe8858d>] [<e850ffff>] [<fffbe485>]
       c62e5000 00000001 c62e5764 c016a6a2 00000000 00000000 00000000 c03087a0
       c62e5364 c0170b72 c62e5000 c029f00c c011d597 00000000 00000001 c02f559
Call Trace: [<c0247f53>] [<c010be0d>] [<c0247f68>] [<c016a6a2>] [<c0170b72>] [<c011d597>] [<c011d4ba>]
       [<c010c1ea>] [<c010a8b4>] [<c0241150>] [<c013614d>] [<c0108d53>]
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; ec83e045 <END_OF_CODE+21f5c2da/???
Trace; e0458b0c <END_OF_CODE+15b76da1/???
Trace; e850c0b7 <END_OF_CODE+1dc2a34c/???

   < ... blah blah blah ... >

Trace; e9000000 <END_OF_CODE+1e71e295/???
Trace; ff016a08 <END_OF_CODE+34734c9d/???
Trace; fbe8858d <END_OF_CODE+315a6822/???
Trace; e850ffff <END_OF_CODE+1dc2e294/???
Trace; fffbe485 <END_OF_CODE+356dc71a/???
Trace; c0247f53 <stext_lock+89ff/9110>
Trace; c010be0d <__global_cli+8d/12c>
Trace; c0247f68 <stext_lock+8a14/9110>
Trace; c016a6a2 <flush_to_ldisc+8e/110>
Trace; c0170b72 <console_softint+56/11c>
Trace; c011d597 <tasklet_action+4f/7c>
Trace; c011d4ba <do_softirq+5a/88>
Trace; c010c1ea <do_IRQ+da/ec>
Trace; c010a8b4 <ret_from_intr+0/20>
Trace; c0241150 <stext_lock+1bfc/9110>
Trace; c013614d <kupdate+10d/110>
Trace; c0108d53 <kernel_thread+23/30>

The trace always include __global_cli(), obviously -
wait_on_irq() can't get called from anywhere else, AIUI. The rest of
the trace (except for the lines immediately above and below __global_cli
-- c0247f53 and c0247f68) is different each time.

The stack of the 'other' CPU each time is crap. Whether it's _really_ crap
or whether the code in show() is just doing something wrong I'm not sure.

--
dwmw2

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Oct 15 2000 - 21:00:23 EST