Re: 2.6.18-rc3-mm2 - ext3 locking issue?

From: Valdis . Kletnieks
Date: Wed Aug 09 2006 - 16:41:21 EST


On Wed, 09 Aug 2006 13:01:51 PDT, Andrew Morton said:

> > Yum managed to get wedged: 'echo t > /proc/sysrq-trigger' says:
> >
> > [ 4514.840000] yum D D5C32AA0 0 4747 4430
(NOTLB)
> > [ 4514.840000] d5c3dda4 d5c3dd78 00000007 d5c32aa0 bd3ddd00 00000338
00000000 d5c32bc0
> > [ 4514.840000] c1601628 d5c3dd9c 64600300 0000001f d5c3ddd8 d5c3ddd8
c1601628 d5c3ddac
> > [ 4514.840000] c034fef8 d5c3ddb4 c0136e8e d5c3ddcc c0350026 c0136e58
d5c3ddd8 00000000
> > [ 4514.840000] Call Trace:
> > [ 4514.840000] [<c034fef8>] io_schedule+0x25/0x44
> > [ 4514.840000] [<c0136e8e>] sync_page+0x36/0x3a
> > [ 4514.840000] [<c0350026>] __wait_on_bit_lock+0x30/0x58
> > [ 4514.840000] [<c0136e44>] __lock_page+0x51/0x59
> > [ 4514.840000] [<c013f099>] truncate_inode_pages_range+0x1de/0x230
> > [ 4514.840000] [<c013f0f7>] truncate_inode_pages+0xc/0x11
> > [ 4514.840000] [<c018ea12>] ext3_delete_inode+0x16/0xbd
> > [ 4514.840000] [<c016798f>] generic_delete_inode+0xb6/0x130
> > [ 4514.840000] [<c0167a1b>] generic_drop_inode+0x12/0x166
> > [ 4514.840000] [<c01673f1>] iput+0x67/0x6a
> > [ 4514.840000] [<c0165662>] dentry_iput+0x97/0xcc
> > [ 4514.840000] [<c016613d>] dput+0x183/0x19c
> > [ 4514.840000] [<c015f64f>] sys_renameat+0x17a/0x1d3
> > [ 4514.840000] [<c015f6ba>] sys_rename+0x12/0x14
> > [ 4514.840000] [<c0102849>] sysenter_past_esp+0x56/0x79
> >
> > A careful check of the dmesg doesn't reveal anything particularly helpful,
> > like an oops or other relevant kernel message.
>
> Usually this means that there's an IO request in flight and it got lost
> somewhere. Device driver bug, IO scheduler bug, etc. Conceivably a
> lost interrupt (hardware bug, PCI setup bug, etc).
>
> Which device driver and which IO sched are you using?

Aug 9 13:33:13 turing-police kernel: [ 11.297507] libata version 2.00 loaded.
Aug 9 13:33:14 turing-police kernel: [ 11.297763] ata_piix 0000:00:1f.1: version 2.00ac6
Aug 9 13:33:14 turing-police kernel: [ 11.297780] PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
Aug 9 13:33:14 turing-police kernel: [ 11.299245] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
Aug 9 13:33:14 turing-police kernel: [ 11.299786] ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
Aug 9 13:33:14 turing-police kernel: [ 11.300638] PCI: Setting latency timer of device 0000:00:1f.1 to 64
Aug 9 13:33:15 turing-police kernel: [ 11.300720] ata1: PATA max UDMA/100 cmd 0x1F0 ctl 0x3F6 bmdma 0xBFA0 irq 14
Aug 9 13:33:15 turing-police kernel: [ 11.301381] scsi0 : ata_piix
...

Disk was running with 'cfq' scheduler. I checked the dmesg, and only odd thing
was this:

Aug 9 14:30:24 turing-police kernel: [ 3535.720000] end_request: I/O error, dev fd0, sector 0

Wierd though - floppy and ATA are on different IRQs according to /proc/interrupts:

CPU0
0: 11122651 XT-PIC-level timer
1: 12532 XT-PIC-level i8042
2: 0 XT-PIC-level cascade
5: 190651 XT-PIC-level Intel 82801CA-ICH3
6: 5 XT-PIC-level floppy
8: 1 XT-PIC-level rtc
9: 1 XT-PIC-level acpi
11: 238728 XT-PIC-level uhci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, yenta, yenta, yenta, ohci1394, pcmcia2.0, eth3
12: 114 XT-PIC-level i8042
14: 172656 XT-PIC-level libata
15: 0 XT-PIC-level libata
NMI: 0
LOC: 0
ERR: 1
MIS: 0

(For the record, the laptop doesn't even *have* a floppy drive installed at the
moment)

Attachment: pgp00000.pgp
Description: PGP signature