Re: kernel BUG at mm/filemap.c:332!

From: Nathan Scott
Date: Thu Dec 04 2003 - 16:18:29 EST


On Thu, Dec 04, 2003 at 07:26:38PM +0200, Mihai RUSU wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Linus
>
> First of all thanks for the answer!
>
> On Thu, 4 Dec 2003, Linus Torvalds wrote:
>
> >
> > Nathan,
> > you're not off the hook yet. This is a smoking gun on XFS, and this time
> > with a big clue: large directories, and a low-memory situation.
>
> Sorry to have misguided you guys in the first post. After rebooting the
> machine I have some more information, the actual directory size its about
> some hundred entries (~400) and not thousands as I previously speculated

Still, its a clue - all metadata I/O in XFS goes though the
pagebuf code, so we're looking in the right place.

> However I have some more usefull (I hope) information about the subject.
> Before rebooting I wanted to first install a do_brk() patched 2.4.21-xfs
> kernel with lilo. Unfortunetly lilo stuck in a fsync() call after writing
> to screen that it did added all kernel images to MBR as configured in
> lilo.conf. When I booted I had no problem to boot from the new do_brk()
> fixed kernel so lilo seems it did the job, I dont know why it stuck
> in fsync().

Was your filesystem near full? There was a 2.4 deadlock fixed
recently which could be what you hit there.

> After power on, one coleague complained that a file on which he worked a
> couple of minutes before I took the machine down had NULL bytes instead of
> actual content. I know that "dirty" data gets flushed to disk every 30
> seconds so this seems a little bit strange (in general I know that XFS
> leaves NULL bytes in files modified just before a unclean reboot but this
> file was modified some 5 minutes before the "hard" reboot).

You'll want a more recent 2.4 XFS kernel I suspect - Steve made
several improvements in this area awhile back.

> > Also, this time the config file doesn't have any MD/RAID support according
> > to the attachment:
> >
> > # Multi-device support (RAID and LVM)
> > #
> > # CONFIG_MD is not set
> >
> > so it looks like the XFS and MD issues really are totally unrelated.

Sure does.

> > Mihai: the oops itself is in this case not very telling, since it's just a
> > result of corruption of some fundamental data structures (probably
> > somebody using a page cache page after having free'd it - and it probably
> > only shows up when memory gets low and pages have to be cleaned). Can you
> > tell Nathan more about the filesystem setup (block size, as much as
> > possible about the affected directory, etc).
>
> Ok.
>
> $ xfs_info /var
> meta-data=/var isize=256 agcount=18, agsize=262144 blks
> data = bsize=4096 blocks=4482127, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=0
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=1200
> realtime =none extsz=65536 blocks=0, rtextents=0

OK, looks like a default mkfs then (with an old-ish mkfs binary)?
Newer mkfs' will give you a better AG layout and unwritten extents
would be turned on - not relevent to this problem at all though.

An "ls -ld" and "xfs_bmap -v" on the directory would also provide
me a bit more info to work with -- thanks!

I have a few ideas about what this might be, let me stew on those
for a bit and try a few things.

cheers.

--
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/