Re: Cyrus mmap vs lseek/write usage - (WAS: BUG: mmapfile/writevspurious zero bytes (x86_64/not i386, bisected, reproducable))

From: Linus Torvalds
Date: Wed Jun 18 2008 - 20:21:20 EST




On Thu, 19 Jun 2008, Robert Mueller wrote:
>
> As noted above, one thing cyrus does which does seem to be plain "wrong"
> is that it mmaps a region greater the file size (rounds to an 8k
> boundary, but 8k-16k past the current end of the file) and then assumes
> that when it writes to the end of the file (but less than the end of the
> mmap region) that there's no need to remmap and that data is immediately
> available within the previous mmaped region.

Pretty much any OS that tries to be make mmap() coherent with regular
read/write accesses will automatically also have to be coherent wrt file
size updates.

IOW, I don't think that cyrus is real any more "wrong" in this than in
assuming that you can mix read/write and mmap() accesses. In fact, I
suspect that Cyrus is probably _more_ conservative than most, in that it
would not be totally unheard of to just do a much bigger mmap(), and not
even bother to re-do it until the file grows past that size (ie no 8k/16k
granularity, but make it arbitrarily non-granular).

> Apparently that works on most OS's (but is what this bug actually
> exposed), but according to the mmap docs:
>
> ---
> If the size of the mapped file changes after the call to mmap() as a
> result of some other operation on the mapped file, the effect of
> references to portions of the mapped region that correspond to added or
> removed portions of the file is unspecified.

Note that if you really want to be portable, you simply must not mix
mmap() with *any* other operations without sprinking in a healthy amount
of "msync()" or unmapping/remapping entirely.

So _in_practice_ - because everybody tries to do a good job - you can
actually expect to have mmap() be coherent, even though there are no real
guarantees.

> Amazingly (apart from HP/UX) no OS actually seems to have a problem with
> this since there would be massive cyrus bug reports otherwise.

Yeah. Over the years, the pain from having a non-coherent mmap() generally
has pushed everybody into just making mmap() easy to use. Which means that
mixing things generally works fine, even if it is not at all _guaranteed_.

So I'd expect mmap+write to work and be coherent almost always. But it's
still a fairly unusual combination, and I would personally think that
using MAP_SHARED and writing through the mmap() would be the less
surprising model.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/