Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling touse bmap to file blocks.
From: Neil Brown
Date: Sat May 13 2006 - 00:40:13 EST
On Friday May 12, akpm@xxxxxxxx wrote:
> NeilBrown <neilb@xxxxxxx> wrote:
> >
> > If md is asked to store a bitmap in a file, it tries to hold onto the
> > page cache pages for that file, manipulate them directly, and call a
> > cocktail of operations to write the file out. I don't believe this is
> > a supportable approach.
>
> erk. I think it's better than...
>
> > This patch changes the approach to use the same approach as swap files.
> > i.e. bmap is used to enumerate all the block address of parts of the file
> > and we write directly to those blocks of the device.
>
> That's going in at a much lower level. Even swapfiles don't assume
> buffer_heads.
I'm not "assuming" buffer_heads. I'm creating buffer heads and using
them for my own purposes. These are my pages and my buffer heads.
None of them belong to the filesystem.
The buffer_heads are simply a convenient data-structure to record the
several block addresses for each page. I could have equally created
an array storing all the addresses, and built the required bios by
hand at write time. But buffer_heads did most of the work for me, so
I used them.
Yes, it is a lower level, but
1/ I am certain that there will be no kmalloc problems and
2/ Because it is exactly the level used by swapfile, I know that it
is sufficiently 'well defined' that no-one is going to break it.
>
> When playing with bmap() one needs to be careful that nobody has truncated
> the file on you, else you end up writing to disk blocks which aren't part
> of the file any more.
Well we currently play games with i_write_count to ensure that no-one
else has the file open for write. And if no-one else can get write
access, then it cannot be truncated.
I did think about adding the S_SWAPFILE flag, but decided to leave
that for a separate patch and review different approaches to
preventing write access first (e.g. can I use a lease?).
>
> All this (and a set_fs(KERNEL_DS), ug) looks like a step backwards to me.
> Operating at the pagecache a_ops level looked better, and more
> filesystem-independent.
If you really want filesystem independence, you need to use vfs_read
and vfs_write to read/write the file. I have a patch which did that,
but decided that the possibility of kmalloc failure at awkward times
would make that not suitable.
So I now use vfs_read to read in the file (just like knfsd) and
bmap/submit_bh to write out the file (just like swapfile).
I don't think a_ops really provides an interface that I can use, partly
because, as I said in a previous email, it isn't really a public
interface to a filesystem.
>
> I haven't looked at this patch at all closely yet. Do I really need to?
I assume you are asking that because you hope I will retract the
patch. While I'm always open to being educated, I am not yet
convinced that there is any better way, or even any other usable way,
to do what needs to be done, so I am not inclined to retract the
patch.
I'd like to say that you don't need to read it because it is perfect,
but unfortunately history suggests that is unlikely to be true.
Whether you look more closely is of course up to you, but I'm convinced
that patch is in the right direction, and your review and comments are
always very valuable.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/