Re: Question regarding concurrent accesses through block device and fs

From: Nick Piggin
Date: Thu Feb 19 2009 - 08:45:30 EST


On Thursday 19 February 2009 22:07:42 Francis Moreau wrote:
> [ Resend to LKLM, hopping to get a wider audience ;) and to Andrew Morton
> since he wrote that part of the code, I think ]
>
> Hello,
>
> I have a question regarding the page cache/buffer heads behaviour when
> some blocks are accessed through a regular file and through the block
> dev hosting this file.
>
> First it looks like when accessing some blocks through a block device,
> the same mechanisms are used as when reading a file through a file
> system: the page cache is used.

Yes. page cache of the block device is also sometimes called buffer cache,
for historical reasons.


> That means that a block could be mapped by several buffers at the same
> time.
>
> I don't see any issues to this (if we agree that the behaviour is undefined
> in that case) but looking at __block_prepare_write(), it seems that we
> don't want this to happen since it does:
>
> [...]
> if (buffer_new(bh)) {
> unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
> [...]
> }
>
> where unmap_underlying_metadata() unmaps the blockdev buffer
> which maps b_blocknr block.
>
> This code seems to catch only the case where the buffer is new (I don't
> see why only this case is treated).
>
> Also this call seems unneeded if __block_prepare_write() is called
> when writing through the block dev since we already know that the buffer
> doesn't exist (we are here to create it).
>
> I already read the comment of the function unmap_underlying_metadata()
> but I failed to understand it...
>
> Could anybody tell me what is the actual policy ?

This is done for only newly allocated on-disk blocks, (which is what
buffer_new means, not new in-memory buffers). And it is only there to
synchronize buffercache access by the filesystem for its metadata, rather
than trying to make /dev/bdev access coherent with file access.

Basically what can happen is that a filesystem will have perhaps allocated
a block for an array of indirect pointers. The filesystem manages this
via the buffercache and writes a few pointers into it. Then suppose the file
is truncated and that block becomes unused so it can be freed by the
filesystem block allocator. And the filesystem may also call bforget to
prevent the now useless buffer from being written out in future.

Now suppose a new block required for *file* data, and the filesystem happens
to reallocate that block. So now we may still have that old buffercache and
buffer head around, but we also have this new pagecache and buffer head for
the file that points to the same block (buffer_new will be set on this new
buffer head, btw, to reflect that it is a newly allocated block).

All fine so far.

Now there is a potential problem because the old buffer can *still be under
writeback* dating back from when it was still good metadata and before
bforget was called. That's a problem because the new buffer is expecting
to be the owner and master of the block and its data.

That is what the second paragraph in the comment refers to. I don't actaully
quite know what the problem is that is described in the first paragraph.
Andrew do you know?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/