XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

From: Shlomi Fish
Date: Wed Dec 07 2005 - 07:03:06 EST


Hi all!

(Please CC me on replies)

I encountered a problem with the Linux kernel handling of XFS, in which
attempting to mount a certain XFS partition (but not a different one on the
same hard-disk) caused the mount process to hang, and all other XFS-aware
apps (like "xfs_check" or "xfs_repair") to hang too. However, running
xfs_check
or xfs_repair before the first mount (after a reboot) worked, and eventually
resolved this problem.

I blogged about it (relatively incoherently) here:

http://www.livejournal.com/~shlomif/7182.html?mode=reply
http://www.livejournal.com/~shlomif/7547.html?mode=reply

It all happened after I detected some problems on my Mandriva 2006 system
(that was using kernel 2.4.15-rc2 from Linus), and then rebooted twice,
thinking something went wrong. Then a loadlin-booted kernel was unable to
load the kernel.

Knoppix ran fine, but it also hang up on attempting to mount the XFS
partition. It used a much older kernel. I then tried to boot Kubuntu (which
was on another XFS partition on the same disk) and it booted fine. Still, it
was unable to mount the partition. (It too had an older kernel).

After I compiled a 2.6.14.3 kernel, and booted Kubuntu with it, it again
could not mount the XFS partition, and after doing that xfs_check and
xfs_repair both hanged up as well. After a reboot, I tried running xfs_check
right away on that partition and it worked. So I ran xfs_repair, and after it
finished, tried to mount the partition it worked. Then Mandriva booted fine.

I did not had any problems since then (I have an uptime of 11 days now using
kernel 2.6.14.3), and so it doesn't seem like a hard disk problem. Something
using kernel 2.6.15-rc2 caused the XFS partition to become defected, and
worse - something in all the kernels starting from that of the first official
Kubuntu release, (or the Knoppix I had), caused an attempt to mount the
Mandriva partition to hang the process, and subsequent accesses to the
partition by xfs_check and xfs_repair to fail as well.

I can no longer reproduce the problem, but it might be worth going over the
code. If it helps I can privately send a dump of the first 131,072,000 bytes
of the XFS partition to someone trustworthy.

Regards,

Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish shlomif@xxxxxxxxxxx
Homepage: http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/