Re: BUG: mmapfile/writev spurious zero bytes (x86_64/not i386,bisected, reproducable)

From: Linus Torvalds
Date: Wed Jun 18 2008 - 00:04:34 EST




On Wed, 18 Jun 2008, Bron Gondwana wrote:
>
> For my sins, I appear to be becoming the world expert on
> that particular file.

Heh. Congrats ;)

> I've debugged skiplist bugs many times over, and completely rewritten
> the locking code. It really does some pretty evil things - the memory
> accesses look something like this:
>
> [file...................]
> [mmap^....^.^........^^..................................]
> [file...................++++++++++++]
> [mmap^....^.^........^^.^^ ^ ^^.....................]
>
> Where (^) is the bits that get accessed. All reads are via
> the mmap, all writes are done with retry_write or
> retry_writev (Cyrus library functions that keep hammering
> until all the bytes are written)

Is there any reason it doesn't use mmap(MAP_SHARED) and make the
modifications that way too?

Because quite frankly, the mixture of doing mmap() and write() system
calls is quite fragile - and I'm not saying that just because of this
particular bug, but because there are all kinds of nasty cache aliasing
issues with virtually indexed caches etc that just fundamentally mean that
it's often a mistake to mix mmap with read/write at the same time.

(For the same reason it's not a good idea to mix writing through an mmap()
and then using read() to read it - again, you can have some nasty aliasing
going on there).

So this particular issue was definitely a kernel bug (and big thanks for
making such a good test-case), but in general, it does sound like Cyrus is
actively trying to dig itself into a nasty hole there.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/