Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.
From: Eric W. Biederman
Date: Sun Oct 21 2007 - 00:54:37 EST
Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:
> On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
>> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
>> > I don't think we little angels want to tread here. There are so many
>> > weirdo things out there which will break if we bust the coherence between
>> > the fs and /dev/hda1.
>>
>> We broke coherence between the fs and /dev/hda1 when we introduced
>> the page cache years ago,
>
> Not for metadata. And I wouldn't expect many filesystem analysis
> tools to care about data.
Well tools like dump certainly weren't happy when we made the change.
>> and weird hacky cases like
>> unmap_underlying_metadata don't change that.
>
> unmap_underlying_metadata isn't about raw block device access at
> all, though (if you write to the filesystem via the blockdevice
> when it isn't expecting it, it's going to blow up regardless).
Well my goal with separating things is so that we could decouple two
pieces of code that have different usage scenarios, and where
supporting both scenarios simultaneously appears to me to needlessly
complicate the code.
Added to that we could then tune the two pieces of code for their
different users.
>> Currently only
>> metadata is more or less in sync with the contents of /dev/hda1.
>
> It either is or it isn't, right? And it is, isn't it? (at least
> for the common filesystems).
ext2 doesn't store directories in the buffer cache.
Journaling filesystems and filesystems that do ordered writes
game the buffer cache. Putting in data that should not yet
be written to disk. That gaming is where reiserfs goes BUG
and where JBD moves the dirty bit to a different dirty bit.
So as far as I can tell what is in the buffer cache is not really
in sync with what should be on disk at any given movement except
when everything is clean.
My suspicion is that actually reading from disk is likely to
give a more coherent view of things. Because there at least
we have the writes as they are expected to be seen by fsck
to recover the data, and a snapshot there should at least
be recoverable. Whereas a snapshot provides not such guarantees.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/