Please help me understand ->writepage. Was Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

From: Neil Brown
Date: Mon Nov 21 2005 - 18:07:22 EST


On Thursday November 17, sander@xxxxxxxxxxx wrote:
> Sander wrote (ao):
> # Sander wrote (ao):
> # # Neil Brown wrote (ao):
> # # > On Wednesday November 16, akpm@xxxxxxxx wrote:
> # # > > Sander <sander@xxxxxxxxxxx> wrote:
> # # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
> # # > > > try this:
> # # > >
> # # > > It oopsed in reiser4. reiserfs-dev added to Cc...
> # # > >
> # # >
> # # > Hmm... It appears that md/bitmap is calling prepare_write and
> # # > commit_write with 'file' as NULL - this works for some filesystems,
> # # > but not for reiser4.
> # # >
> # # > Does this patch help.
> # #
> # # Something changed, but it didn't fix it it seems:
> # #
> # # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
> # # mdadm: RUN_ARRAY failed: No such file or directory
> #
> # FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap
> # which is tmpfs, and also happens when I attach both loop0 and loop1 to
> # files on tmpfs.
> #
> # This would suggest that reiser4 is not solely at fault?
> #

No, there is something very wrong in md/bitmap.c's handling of writing
to a file. It was developed for, and tested on, ext3 and doesn't seem
to work anywhere else.... and I don't understand enough to fix it.

Help ???

What md/bitmap wants to do is effectively memory map the file, make
updates to pages occasionally, flush those pages out to storage, and
wait for the flush to complete. It doesn't exactly memory map. It
just reads all the pages and keeps them in an array (holding a
reference to each).

To write the pages out it effectively does ->prepare_write,
->commit_write, and then ->writepage.
I'm not sure that prepare/commit is needed, but they don't seem to be
the problem. writepage is.

For tmpfs at least, writepage disconnects the page from the pagecache
(via move_to_swap_cache), so the page that we are holding is no longer
part of the file and, significantly, page->mapping become NULL.
This suggests that the ->writepage usage is broken.
However I tried to see what 'msync' does for real memory mapped files,
and it eventually calls ->writepage too. So how does that work??

Any advice would be most welcome!

Thanks,
NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/