Re: [PATCH] SMP race in ext2 - metadata corruption.

From: Andrea Arcangeli (andrea@suse.de)
Date: Sat May 05 2001 - 22:51:17 EST


On Sun, May 06, 2001 at 03:00:58PM +1200, Chris Wedgwood wrote:
> On Sun, May 06, 2001 at 04:50:01AM +0200, Andrea Arcangeli wrote:
>
> Moving e2fsck into the kernel is a completly different matter
> than caching the blockdevice accesses with pagecache instead of
> buffercache.
>
> No, I was takling about user space fsck using character devices.

I misread your previous email sorry, I think you meant to fsck using
rawio (not to move fsck into the kernel). You can do that just now but
to get decent performance then fsck should do self caching, changing
fsck to do self caching doesn't sound worthwhile either. Note also that
rawio has nothing to do with the pagecache. Infact both rawio and
O_DIRECT bypasses all the pagecache and its smp locks for example.

> I'm not claiming it is... what I'm asking is _why_ do we need block
> devices once 'everything' lives in the page cache?

Where the cache of the blockdevice lives is a completly orthogonal
problem with "why cached blockdevices are useful" which I addressed in
the previous email.

> It's just that by doing it in pagecache you can mmap it as well
> and it will provide overall better performance and it's probably
> cleaner design. The only visible change is that you will be able
> to mmap a blockdevice as well.
>
> Why? What needs to mmap a block device? Since these are typically
> larger than that you can mmap into a 32-bit address space (yes, I'm
> ignoring the 5% or so of cases where this isn't true) I'm not aware
> on many applications that do it.

Last time I talked with the parted maintainer he was asking for that
feature so that parted won't need to do its own anti-oom management in
userspace, so he can simple mmap(MAP_SHARED) a quite large region of
metadata of the blockdevice, read/write to the mmaped region and the
kernel will take care of doing paging when it runs low on memory. right
now it allocates the metadata in anonymous memory and loads it via
read(). This memory will need to be swapped out if the working set
doesn't fit in ram (and swap may not be available ;).

> As I said, I'm not takling about kernel based fsck, although for
> _VERY_ large filesystems even with journalling I suspect it will be
> required one day (so it can run in the background and do consistency
> checking when the machine is idle).

Being able to fsck a live filesystem is yet another exotic feature and
yes for that you would certainly need some additional kernel support.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 07 2001 - 21:00:23 EST