- No deadlocks (hopefully). The buffer layer is technically deadlocky by
design, because it can require memory allocations at page writeout-time.
It also has one path that cannot tolerate memory allocation failures.
No such problems for fsblock, which keeps fsblock metadata around for as
long as a page is dirty (this still has problems vs get_user_pages, but
that's going to require an audit of all get_user_pages sites. Phew).
- In line with the above item, filesystem block allocation is performed
before a page is dirtied. In the buffer layer, mmap writes can dirty a
page with no backing blocks which is a problem if the filesystem is
ENOSPC (patches exist for buffer.c for this).
- An inode's metadata must be tracked per-inode in order for fsync to
work correctly. buffer contains helpers to do this for basic
filesystems, but any block can be only the metadata for a single inode.
This is not really correct for things like inode descriptor blocks.
fsblock can track multiple inodes per block. (This is non trivial,
and it may be overkill so it could be reverted to a simpler scheme
like buffer).
- Large block support. I can mount and run an 8K block size minix3 fs on
my 4K page system and it didn't require anything special in the fs. We
can go up to about 32MB blocks now, and gigabyte+ blocks would only
require one more bit in the fsblock flags. fsblock_superpage blocks
are > PAGE_CACHE_SIZE, midpage ==, and subpage <.
So. Comments? Is this something we want? If yes, then how would we
transition from buffer.c to fsblock.c?