Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler

From: Jens Axboe
Date: Tue Mar 29 2005 - 07:12:17 EST


On Tue, Mar 29 2005, Chris Rankin wrote:

(please don't top post)

> --- Jens Axboe <axboe@xxxxxxx> wrote:
> > On Sun, Mar 27 2005, Chris Rankin wrote:
> > > [gcc-3.4.3, Linux-2.6.11-SMP, Dual P4 Xeon with HT enabled]
> > >
> > > Hi,
> > >
> > > My Linux 2.6.11 box oopsed when I tried to logout. I have switched to using the anticipatory
> > > scheduler instead.
> > >
> > > Cheers,
> > > Chris
> > >
> > > NMI Watchdog detected LOCKUP on CPU1, eip c0275cc7, registers:
> > > Modules linked in: snd_pcm_oss snd_mixer_oss snd_usb_audio snd_usb_lib snd_intel8x0
> > snd_seq_oss
> > > snd_seq_midi snd_emu10k1_synth snd_emu10k1 snd_ac97_codec snd_pcm snd_page_alloc
> > snd_emux_synth
> > > snd_seq_virmidi snd_rawmidi snd_seq_midi_event snd_seq_midi_emul snd_hwdep snd_util_mem
> > snd_seq
> > > snd_seq_device snd_rtctimer snd_timer snd nls_iso8859_1 nls_cp437 vfat fat usb_storage radeon
> > drm
> > > i2c_algo_bit emu10k1_gp gameport deflate zlib_deflate zlib_inflate twofish serpent aes_i586
> > > blowfish des sha256 crypto_null af_key binfmt_misc eeprom i2c_sensor button processor psmouse
> > > pcspkr p4_clockmod speedstep_lib usbserial lp nfsd exportfs md5 ipv6 sd_mod scsi_mod autofs
> > nfs
> > > lockd sunrpc af_packet ohci_hcd parport_pc parport e1000 video1394 raw1394 i2c_i801 i2c_core
> > > ohci1394 ieee1394 ehci_hcd soundcore pwc videodev uhci_hcd usbcore intel_agp agpgart ide_cd
> > cdrom
> > > ext3 jbd
> > > CPU: 1
> > > EIP: 0060:[<c0275cc7>] Not tainted VLI
> > > EFLAGS: 00200086 (2.6.11)
> > > EIP is at _spin_lock+0x7/0xf
> > > eax: f7b8b01c ebx: f7c82b88 ecx: f7c82b94 edx: f6c33714
> > > esi: eb68ad88 edi: f6c33708 ebp: f6c33714 esp: f5b32f70
> > > ds: 007b es: 007b ss: 0068
> > > Process nautilus (pid: 5757, threadinfo=f5b32000 task=f7518020)
> > > Stack: c01f7f79 00200282 f76bda24 f6c323e4 f7518020 00000000 00000000 c01f1d0c
> > > f5b32000 c011d7b3 00000001 00000000 b65ffa40 00000000 f5b32fac 00000000
> > > 00000000 00000000 f5b32000 c011d8d6 c0102e7f 00000000 b65ffbf0 b6640bf0
> > > Call Trace:
> > > [<c01f7f79>] cfq_exit_io_context+0x54/0xb3
> > > [<c01f1d0c>] exit_io_context+0x45/0x51
> > > [<c011d7b3>] do_exit+0x205/0x308
> > > [<c011d8d6>] next_thread+0x0/0xc
> > > [<c0102e7f>] syscall_call+0x7/0xb
> > > Code: 05 e8 3a e6 ff ff c3 ba 00 f0 ff ff 21 e2 81 42 14 00 01 00 00 f0 81 28 00 00 00 01 74
> > 05 e8
> > > 1d e6 ff ff c3 f0 fe 08 79 09 f3 90 <80> 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 01 74 05 e8 ff
> > e5
> > > ff
> > > console shuts up ...
> >
> > The queue was gone by the time the process exited. What type of storage
> > do you have attached to the box? At least with SCSI, it has some
> > problems in this area - it will glady free the scsi device structure
> > (where the queue lock is located) while the queue reference count still
> > hasn't dropped to zero.
>
> I have one IDE hard disc, but I was using a USB memory stick at one
> point. (Notice the usb-storage and vfat modules in my list.) Could
> that be the troublesome SCSI device?

Yes, it probably is. What happens is that you insert the stick and do io
against it, which sets up a process io context for that device. That
context persists until the process exits (or later, if someone still
holds a reference to it), but the queue_lock will be dead when you yank
the usb device.

It is quite a serious problem, not just for CFQ. SCSI referencing is
badly broken there.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/