Re: msync() behaviour broken for MS_ASYNC, revert patch?

From: Nick Piggin
Date: Fri Feb 10 2006 - 11:59:35 EST

Linus Torvalds wrote:

On Fri, 10 Feb 2006, Nick Piggin wrote:

It may be a very useful operation in kernel, but I think userspace either
wants to definitely know the data is on disk (WRITE_SYNC), or give a hint
to start writing (WRITE_ASYNC).

Only from a _stupid_ user standpoint.

The fact is, "start writing and wait for the result" is fundamentally a totally broken operation.

No. Userspace has (almost) a transparent pagecache to backing store,
the only time they care about it is data integrity points in which
case they want to know that it is flushed; or performance hints which
might tell the kernel to write them sooner, or later (or other hints).

Wait until writeout has finished is like an implementation detail that
I can't see how it would be ever useful on its own.


Because a smart user actually would want to do

- start writing this
- start writing that
- start writing that-other-thing
- wait for them all.

No, you are thinking about what the kernel does. Subtle difference. A
smart user wants to:

- start writing this
- start writing that
- start writing that-other-thing
- make sure this that and the other have reached backing store

OK so in effect it is the same thing, but it is better to export the
interface that reflects how the user interacts with pagecache.

WRITE_SYNC obviously does the "wait for them all" (aka ensure they
hit backing store) thing too, right? It performs exactly the same
role that WRITE_WAIT would do in the above example.

The reason synchronous write performance is absolutely disgusting is exactly that people think "start writing" should be paired up with "wait for it".

So the kernel internally separates "start writing" and "wait for it" for very good reasons. Reasons that in no way go away just because you use to user space.

They don't go away but they take different forms. "start writing" is
a performance hint. "wait for it" is only ever a part of "send to
backing store" operation.

My proposal isn't really different to Andrew's in terms of functionality
(unless I've missed something), but it is more consistent because it
does not introduce this completely new concept to our userspace API but
rather uses the SYNC/ASYNC distinction like everything else.

SUSE Labs, Novell Inc.
Send instant messages to your online friends -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at