Re: [PATCH 2/2] msync: start async writeout when MS_ASYNC

From: Andrew Morton
Date: Wed Jun 13 2012 - 17:29:44 EST


On Thu, 31 May 2012 22:43:55 +0200
Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:

> msync.c says that applications had better use fsync() or fadvise(FADV_DONTNEED)
> instead of MS_ASYNC. Both advices are really bad:
>
> * fsync() can be a replacement for MS_SYNC, not for MS_ASYNC;
>
> * fadvise(FADV_DONTNEED) invalidates the pages completely, which will make
> later accesses expensive.
>
> Having the possibility to schedule a writeback immediately is an advantage
> for the applications. They can do the same thing that fadvise does,
> but without the invalidation part. The implementation is also similar
> to fadvise, but with tag-and-write enabled.
>
> One example is if you are implementing a persistent dirty bitmap.
> Whenever you set bits to 1 you need to synchronize it with MS_SYNC, so
> that dirtiness is reported properly after a host crash. If you have set
> any bits to 0, getting them to disk is not needed for correctness, but
> it is still desirable to save some work after a host crash. You could
> simply use MS_SYNC in a separate thread, but MS_ASYNC provides exactly
> the desired semantics and is easily done in the kernel.
>
> If the application does not want to start I/O, it can simply call msync
> with flags equal to MS_INVALIDATE. This one remains a no-op, as it should
> be on a reasonable implementation.

Means that people will find that their msync(MS_ASYNC) call will newly
start IO. This may well be undesirable for some.

Also, it hardwires into the kernel behaviour which userspace itself
could have initiated, with sync_file_range(). ie: reduced flexibility.

Perhaps we can update the msync.c code comments to direct people to
sync_file_range()?


One wonders how msync() works with nonlinear mappings. I guess
"badly". I think this was all discussed when we merged
remap_file_pages() (what a mistake that was) and we decided "too hard".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/