broken dpt_i2o (was: ext2_check_page: bad entry in directory)
From: Anders Henke
Date: Wed Nov 28 2007 - 12:42:19 EST
Hi,
I've been bitten by the problem noted in the lkml message of rougly the same
subject, dated back on Oct/24/2007.
My boxes were running 2.6.19 and have been upgraded to 2.6.23.1, but their
bootup failed when trying to mount the root (ext2) filesystem:
---cut
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
ACPI: PCI Interrupt 0000:04:08.0[A] -> GSI 48 (level, low) -> IRQ 16
Adaptec I2O RAID controller 0 irq=16
BAR0 f8880000 - size= 100000
BAR1 f8a00000 - size= 1000000
dpti: If you have a lot of devices this could take a few minutes.
dpti0: Reading the hardware resource table.
TID 008 Vendor: ADAPTEC Device: AIC-7902 Rev: 00000001
TID 009 Vendor: ADAPTEC Device: AIC-7902 Rev: 00000001
TID 515 Vendor: ESG-SHV S Device: SCA HSBP M21 Rev: 0.080
TID 518 Vendor: ADAPTEC R Device: RAID-1 Rev: 3B0AD
scsi0 : Vendor: Adaptec Model: 2010S FW:3B0A
scsi 0:1:0:0: Direct-Access ADAPTEC RAID-1 3B0A PQ: 0
ANSI: 2
scsi 0:1:6:0: Processor ESG-SHV SCA HSBP M21 0.08 PQ: 0
ANSI: 2
Adaptec aacraid driver 1.1-5[2449]-ms
GDT-HA: Storage RAID Controller Driver. Version: 3.05
GDT-HA: Found 0 PCI Storage RAID Controllers
3ware Storage Controller device driver for Linux v1.26.02.002.
3ware 9000 Storage Controller device driver for Linux v2.26.02.010.
sd 0:1:0:0: [sda] 143374336 512-byte hardware sectors (73408 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Write cache: enabled, read cache: enabled, supports
DPO and FUA
sd 0:1:0:0: [sda] 143374336 512-byte hardware sectors (73408 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Write cache: enabled, read cache: enabled, supports
DPO and FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
sd 0:1:0:0: [sda] Attached SCSI disk
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please
boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
md: raid1 personality registered for level 1
EDAC MC: Ver: 2.1.0 Oct 23 2007
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Starting balanced_irq
Using IPI Shortcut mode
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 264k freed
EXT2-fs error (device sda1): ext2_check_page: bad entry in directory #2:
rec_len is smaller than minimal - offset=0, inode=0, rec_len=0,
name_len=0
Warning: unable to open an initial console.
Kernel panic - not syncing: No init found. Try passing init= option to
kernel.
Rebooting in 30 seconds..
---cut
Rebooting the box into 2.6.19 works without any problems.
I've checked the changelogs for 2.6.24-rc*, but haven't come across a
solution for this issue; but maybe I've also overseen the point.
http://lkml.org/lkml/2007/10/24/224, this bug has been reported earlier.
I've contacted Jan Kara off-list; as booting into 2.6.19 works and e2fsck
on an e2image file doesn't show any errors, we assumed that the Ext2 itself
is fine.
As "everything is reported as being zero" is quite odd an Jan took a
guess that it might be block-layer or driver-related, I've assumed
that the driver is responsible for this; just out of the curiousity,
I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
I haven't yet fine-tested from which kernel release on the dpt_i2o driver
behaves like this and spews out zeroed blocks when trying to mount
the rootfs. Maybe this is just some timing issue.
For some strange reason, this doesn't affect all boxes running the
dpt_i2o driver.
Affected (verified on 6 out of 6 tested boxes so far):
Intel SE7501WV2S using an Adaptec 2010S with the following "lspci -vn"-section:
0000:04:08.0 0104: 1044:a511 (rev 01)
Subsystem: 1044:c035
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
BIST result: 00
Memory at fe900000 (32-bit, non-prefetchable) [size=1M]
Memory at fb000000 (32-bit, prefetchable) [size=16M]
Memory at f8000000 (32-bit, prefetchable) [size=32M]
Expansion ROM at f6200000 [disabled] [size=32K]
Capabilities: [44] Power Management version 2
Not affected are e.g. a box with a Supermicro X5DPR using an Adaptec 2015S
and the following "lspci -vn"-section:
0000:03:03.0 0104: 1044:a511 (rev 01)
Subsystem: 1044:c034
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
BIST result: 00
Memory at f8300000 (32-bit, non-prefetchable) [size=1M]
Memory at fb000000 (32-bit, prefetchable) [size=16M]
Memory at fc000000 (32-bit, prefetchable) [size=32M]
Capabilities: [44] Power Management version 2
... and of course boxes not using an dpt_i2o-driven Controller.
The Adaptec 2010S-boxes are currently running the Adaptec firmware 3B05,
while the Adaptec 2015S box is running firmware 3B0A. As those
controllers are capable of running the same firmware image, maybe
a firmware update might resolve this issue as well (well, unlikely
according to the changelog); the above bootup log is from an updated
box, so the firmware update didn't help. What really helps is the older
driver.
Anders
--
1&1 Internet AG System Design
Brauerstrasse 48 v://49.721.91374.50
D-76135 Karlsruhe f://49.721.91374.225
Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/