Re: [BUG] kernel 2.6.15.4: soft lockup detected on CPU#0!

From: Bartlomiej Zolnierkiewicz
Date: Fri Feb 17 2006 - 06:44:56 EST


On 2/16/06, Andrew Morton <akpm@xxxxxxxx> wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx> wrote:
> >
> > On 2/16/06, Andrew Morton <akpm@xxxxxxxx> wrote:
> > > Charles-Edouard Ruault <ce@xxxxxxxxxx> wrote:
> > > >
> > > > i was trying to rip a CD when the whole machine started to freeze
> > > > periodicaly, i then looked at the logs and found the following :
> > > >
> > > > Feb 12 19:23:39 ruault kernel: hdc: irq timeout: status=0x80 { Busy }
> > > > Feb 12 19:23:39 ruault kernel: ide: failed opcode was: unknown
> > > > Feb 12 19:23:39 ruault kernel: hdd: status timeout: status=0x80 { Busy }
> > > > Feb 12 19:23:39 ruault kernel: ide: failed opcode was: unknown
> > > > Feb 12 19:23:39 ruault kernel: hdd: drive not ready for command
> > >
> > > No idea what caused that.
> > >
> > > > Feb 12 19:23:39 ruault kernel: BUG: soft lockup detected on CPU#0!
> > >
> > > The following was merged today. Hopefully it suppresses this false
> > > positive.
> >
> > Unfortunately it won't. Charles' problem is different (and the BUG
> > output is different!) - soft lockup got triggered for PIO handling in
> > ide-cd driver. This patch fixes the problem only for generic ATA PIO
> > routines (disks and [P]IDENTIFY), ATAPI PIO still needs fixing
> > (ide-cd/floppy/tape/scsi drivers).
>
> argh. We do need to bop that warning on the head - it's consuming people's
> time and causing unneeded concern.

Yes, it is definitely consuming time...
this is 3-rd time we hit false-positive in IDE subsystem.

"soft lockup during probe" problem was fixed by touching
watchdog. I believe proper fix would require using msleep() instead
of mdelay() and we knew this before soft lockup issue.

So from my IDE work POV it was just wasted time...

"ATA PIO" problem was handled in the same way.
I'm still not 100% sure if it was false positive - it looked like
from the trace but I find it hard to believe that users wouldn't
complain about 10sec stalls [ Soft lockup detector claims to
trigger if after 10sec it hasn't been touched - is it really working
as advertised? How can we verify this? ].

Again it only wasted time from IDE POV.

And now ATAPI PIO issues requires me to audit all ATAPI
device drivers to quiet soft lockup detector...

> > Andrew, there is no "high level" function for ATAPI PIO as
> > ide_pio_datablock() for ATA PIO so fixing soft lockup false positives
> > would require going through all drivers and adding bunch of
> > touch_softlockup_watchdog() or using sledge-hammer approach
> > and touching watchdog in ide-iops.c:atapi_[input,output]_bytes().
>
> Send fixup patch, please?

Other things have much bigger priority than quieting soft lockup
detector because somebody decided upon driver maintainers that
soft lockup during hardware access is a _major_ problem (-> system
stays locked) and needs to be fixed ASAP (-> it is enabled by default).

This feature could be useful but why driver maintainers are _forced_ to
pay maintenance cost for it?

I can send you another sledge-hammer fix which is quite disgusting as
it touches watchdog on every ATAPI related PIO access (i.e. many times
in ATAPI PIO IRQ handlers, during packet command transfer).

Alternatively I can send you patch making soft lockup detector depend on
"CONFIG_IDE=n".

Bartlomiej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/