On Mon, Apr 08, 2013 at 05:36:42PM -0700, John Stultz wrote:On 04/07/2013 05:46 PM, Minchan Kim wrote:Personally, I don't think it's to avoid the complexity of implemenation.Hello John,>from parent to child so not-yet-COWed pages could be purged
As you know, userland people wanted to handle vrange with mmaped
pointer rather than fd-based and see the SIGBUS so I thought more
about semantic of vrange and want to make it very clear and easy.
So I suggest below semantic(Of course, it's not rock solid).
mvrange(start_addr, lengh, mode, behavior)
It's same with that I suggested lately but different name, just
adding prefix "m". It's per-process model(ie, mm_struct vrange)
so if process is exited, "volatility" isn't valid any more.
It isn't a problem in anonymous but could be in file-vrange so let's
introduce fvrange for covering the problem.
fvrange(int fd, start_offset, length, mode, behavior)
First of all, let's see mvrange with anonymous and file page POV.
1) anon-mvrange
The page in volaitle range will be purged only if all of processes
marked the range as volatile.
If A process calls mvrange and is forked, vrange could be copiedunless either one of both processes marks NO_VOLATILE explicitly.Ack. This seems reasonable.
Of course, COWed page could be purged easily because there is no link
any more.
2) file-mvrangeThis case doesn't seem ideal to me, but is sort of how the current
A page in volatile range will be purged only if all of processes mapped
the page marked it as volatile AND there is no process mapped the page
as "private". IOW, all of the process mapped the page should map it
with "shared" for purging.
So, all of processes should mark each address range in own process
context if they want to collaborate with shared mapped file and gaurantee
there is no process mapped the range with "private".
Of course, volatility state will be terminated as the process is gone.
code works to avoid the complexity of dealing with memory volatile
ranges that cross page types (file/anonymous). Although the current
code just doesn't purge file pages marked with mvrange().
I thought explict declaration volatility on range before using would be
more clear for userspace programmer.
Otherwise, he can encounter SIGBUS and got confused easily.
Frankly speaking, I don't like to remain volatility permanently although
relavant processes go away and it could make processs using the file
much error-prone and hard to debug it.
Anyway, do you agree my suggestion that "we should not purge any page if
a process are using now with non-shared(ie, private)"?
I'd much prefer file-mvrange calls to behave identically to fvrange calls.Right.
The important point here is that the kernel doesn't *have* to purge
anything ever. Its the kernel's discretion as to which volatile
pages to purge when. So its easier for now to simply not purge file
pages marked volatile via mvolatile.NP but we should write down vague description. User try to use it
in file-backed pages and got disappointed, then is reluctant to use it
any more. :)
I'm not saying that let's write down description implementation specific
but want to say them at least new system call can affect anonymous or file
or both, at least from the beginning. Just hope.
There however is the inconsistency that file pages marked volatileIt needs vma enumeration and mmap_sem read-lock.
via fvrange, then are marked non-volatile via mvrange() might still
be purged. That is broken in my mind, and still needs to be
addressed. The easiest out is probably just to return an error if
any of the mvrange calls cover file pages. But I'd really like a
It could hurt anon-vrange performance severely.
better fix.Another idea is that we can move per-mm vrange element to address_space
when the process goes away if the element covers file-backd vma.
But I'm still very not sure whether we should keep it persistent.