Re: Corrupted filesystem with new Firewire stack

From: Stefan Richter
Date: Thu Aug 16 2007 - 02:27:07 EST


(full quote for linux1394-devel)

Gregor Jasny wrote at lkml:
> today I got the "status write for unknown orb" during early boot
> sequence. This corrupted somehow my root filesystem which is
> completely located at the external disk.
>
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: - management_agent_address: 0xfffff0030000
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: - command_block_agent_address: 0xfffff0100000
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: - status write address: 0x000100000000
> ...
> Aug 15 23:06:02 Mini kernel: Freeing unused kernel memory: 252k freed
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: status write for unknown orb
> Aug 15 23:06:02 Mini kernel: firewire_sbp2: sbp2_scsi_abort
>
> After this error with timeout the kernel complained about missing
> symbols in the intel agp module and udev behave strange.

There were some similar reports involving that "status write for unknown
orb". I haven't found a way to reproduce it; I noticed it only once in
the logs here so far.

> After the next reboot I found the following lines in the kernel logfile:
>
> Aug 15 23:09:52 Mini kernel: firewire_core: created new fw device fw1 (0 config rom retries)
> Aug 15 23:09:52 Mini kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> Aug 15 23:09:52 Mini kernel: firewire_ohci: context_stop: still active (0x00000411)
> Aug 15 23:09:52 Mini kernel: firewire_sbp2: management write failed, rcode 0x11
>
> Sometimes rcode is 0x12:
> Aug 13 22:24:13 Mini kernel: firewire_core: phy config: card 0, new root=ffc0, gap_count=5
> Aug 13 22:24:13 Mini kernel: firewire_sbp2: management write failed, rcode 0x12
> Aug 13 22:24:13 Mini kernel: firewire_sbp2: reconnected to unit fw1.0 (1 retries)

As long as it ends in "reconnected to...", failure messages like this
can typically be ignored. It cannot be predicted which failures are
transient and which aren't, hence all are logged.

> The system is an early Mac Mini with C2D CPU. The external disk is a
> Formax Oxygen 250 with Oxford OXFW_911 chipset. The Kernel was a
> vanilla 2.6.22.2 with the mactel patches applied.

Among else I too have an Intel Mac mini (running x86-64 Linux though)
and a OXFW911 enclosure with a NTFS formatted disk in it; I'll swap the
disk and try stress tests with a native filesystem.

> Is there anything I can do to help debugging this problem?

Alas this kind of bug is harder to debug remotely, and the root FS isn't
exactly ideal for respective tests... I'll try to remember to Cc you on
potentially related patches though.
--
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/