Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words
From: NeilBrown
Date: Tue Oct 02 2012 - 03:39:41 EST
On Fri, 28 Sep 2012 23:16:30 -0400 John Stultz <john.stultz@xxxxxxxxxx> wrote:
>
> After Kernel Summit and Plumbers, I wanted to consider all the various
> side-discussions and try to summarize my current thoughts here along
> with sending out my current implementation for review.
>
> Also: I'm going on four weeks of paternity leave in the very near
> (but non-deterministic) future. So while I hope I still have time
> for some discussion, I may have to deal with fussier complaints
> then yours. :) In any case, you'll have more time to chew on
> the idea and come up with amazing suggestions. :)
Hi John,
I wonder if you are trying to please everyone and risking pleasing no-one?
Well, maybe not quite that extreme, but you can't please all the people all
the time.
For example, allowing sub-page volatile region seems to be above and beyond
the call of duty. You cannot mmap sub-pages, so why should they be volatile?
Similarly the suggestion of using madvise - while tempting - is probably a
minority interest and can probably be managed with library code. I'm glad
you haven't pursued it.
I think discarding whole ranges at a time is very sensible, and so merging
adjacent ranges is best avoided. If you require page-aligned ranges this
becomes trivial - is that right?
I wonder if the oldest page/oldest range issue can be defined way by
requiring apps the touch the first page in a range when they touch the range.
Then the age of a range is the age of the first page. Non-initial pages
could even be kept off the free list .... though that might confuse NUMA
page reclaim if a range had pages from different nodes.
Application to non-tmpfs files seems very unclear and so probably best
avoided.
If I understand you correctly, then you have suggested both that a volatile
range would be a "lazy hole punch" and a "don't let this get written to disk
yet" flag. It cannot really be both. The former sounds like fallocate,
the latter like fadvise.
I think the later sounds more like the general purpose of volatile ranges,
but I also suspect that some journalling filesystems might be uncomfortable
providing a guarantee like that. So I would suggest firmly stating that it
is a tmpfs-only feature. If someone wants something vaguely similar for
other filesystems, let them implement it separately.
The SIGBUS interface could have some merit if it really reduces overhead. I
worry about app bugs that could result from the non-deterministic
behaviour. A range could get unmapped while it is in use and testing for
the case of "get a SIGBUS half way though accessing something" would not
be straight forward (SIGBUS on first step of access should be easy).
I guess that is up to the app writer, but I have never liked anything about
the signal interface and encouraging further use doesn't feel wise.
That's my 2c worth for now. Keep up the good work,
NeilBrown
Attachment:
signature.asc
Description: PGP signature