Re: [PATCH 1/5]: ufs: missed brelse and wrong baseblk

From: Al Viro
Date: Sun Jun 18 2006 - 13:49:29 EST


On Sun, Jun 18, 2006 at 05:20:54PM +0100, Al Viro wrote:
> Comment more on the entire series than on this patch: scenario that causes
> trouble
> * foo is a sparse file on ufs with 8Kb block/1Kb fragment
> * process opens it writable and mmaps it shared
> * it proceeds to dirty the pages
> * fork
> * parent and child do msync() on pages next to each other
>
> I.e. we try to write adjacent pages that sit in the same block. At the
> same time. Each will trigger an allocation and we'd better be very
> careful, or we'll end up allocating the same block twice.

FWIW, for folks who are not familiar with UFS: the important differences
between it and ext2 are
* directory chunk size is fixed to 512, instead of being fs parameter
as in ext2
* names in directory entries are NUL-terminated
* there are two allocation units: fragment and block. Each block
consists of 2^{parameter} fragments. ext2 is what you get when parameter
is 0 (block == fragment). UFS tends to use block:fragment == 8, but
1, 2 and 4 are also allowed.
* equivalent of ext2 free block bitmap has bit per fragment.
* "block" in "direct block", "indirect block", etc. is actually
a group of fragments. The number of first fragment in group is stored
where ext2 would store the block number.
* if there are indirect blocks, all those groups are simply full
blocks; they are aligned to block boundary and consist of block:fragment
ratio fragments.
* if file is shorter than 12 * block size, we have no indirects and
all but the last direct one are full blocks. I.e. the numbers we have
there are multiples of block:fragment ration and a full block is allocated
for each. The last one consists of just enough fragments to reach the
end of file and may be not aligned.

IOW, it's _almost_ as if we had ext2 with all block numbers (in inode and
in indirect blocks) multiplied by block:fragment ratio. The only exception
is for the last direct block of small files - these span fewer fragments
and may be unaligned.

The only subtle part is when we extend a small file; the last direct block
needs to be expanded and that may require relocation.

* block may be bigger than page. That can cause all sorts of fun
problems in interaction with our VM, since allocation can affect more than
one page and that has to be taken into account.

* UFS2 supports ext.attributes; it has two fragment numbers in
inode; they refer to up to 2 blocks worth of data. As with the data
of small files, the partial block doesn't have to be aligned.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/