Re: [patch 1/3] add the fsblock layer

From: Chris Mason
Date: Tue Jun 26 2007 - 08:30:21 EST


On Tue, Jun 26, 2007 at 01:07:43PM +1000, Nick Piggin wrote:
> Neil Brown wrote:
> >On Tuesday June 26, nickpiggin@xxxxxxxxxxxx wrote:
> >
> >>Chris Mason wrote:
> >>
> >>>The block device pagecache isn't special, and certainly isn't that much
> >>>code. I would suggest keeping it buffer head specific and making a
> >>>second variant that does only fsblocks. This is mostly to keep the
> >>>semantics of PagePrivate sane, lets not fuzz the line.
> >>
> >>That would require a new inode and address_space for the fsblock
> >>type blockdev pagecache, wouldn't it? I just can't think of a
> >>better non-intrusive way of allowing a buffer_head filesystem and
> >>an fsblock filesystem to live on the same blkdev together.
> >
> >
> >I don't think they would ever try to. Both filesystems would bd_claim
> >the blkdev, and only one would win.
>
> Hmm OK, I might have confused myself thinking about partitions...
>
> >The issue is more of a filesystem sharing a blockdev with the
> >block-special device (i.e. open("/dev/sda1"), read) isn't it?
> >
> >If a filesystem wants to attach information to the blockdev pagecache
> >that is different to what blockdev want to attach, then I think "Yes"
> >- a new inode and address space is what it needs to create.
> >
> >Then you get into consistency issues between the metadata and direct
> >blockdevice access. Do we care about those?
>
> Yeah that issue is definitely a real one. The problem is not just
> consistency, but "how do the block device aops even know that the
> PG_private page they have has buffer heads or fsblocks", so it is
> an oopsable condition rather than just a plain consistency issue
> (consistency is already not guaranteed).

Since we're testing new code, I would just leave the blkdev address
space alone. If a filesystem wants to use fsblocks, they allocate a new
inode during mount, stuff it into their private super block (or in the
generic super), and use that for everything. Basically ignoring the
block device address space completely.

It means there will be some inconsistency between what you get when
reading the block device file and the filesystem metadata, but we've got
that already (ext2 dir in page cache).

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/