Re: [RFC v7 00/11] Support vrange for anonymous page

From: John Stultz
Date: Mon Apr 15 2013 - 23:33:34 EST


On 04/14/2013 12:42 AM, Minchan Kim wrote:
Hi KOSAKI,

On Thu, Apr 11, 2013 at 11:01:11AM -0400, KOSAKI Motohiro wrote:
and adding new syscall invokation is unwelcome.
Sure. But one more system call could be cheaper than page-granuarity
operation on purged range.
I don't think vrange(VOLATILE) cost is the related of this discusstion.
Whether sending SIGBUS or just nuke pte, purge should be done on vmscan,
not vrange() syscall.
Again, please see the MADV_FREE. http://lwn.net/Articles/230799/
It does changes pte and page flags on all pages of the range through
zap_pte_range. So it would make vrange(VOLASTILE) expensive and
the bigger cost is, the bigger range is.
This haven't been crossed my mind. now try_to_discard_one() insert vrange
for making SIGBUS. then, we can insert pte_none() as the same cost too. Am
I missing something?
For your requirement, we need some tracking model to detect some page is
using by the process currently before VM discards it *if* we don't give
vrange(NOVOLATILE) pair system call(Look at below). So the tracking model
should be formed in vrange(VOLATILE) system call context.

To further clarify Minchan's note here, the reason its important for the application to use vrange(NOVOLATILE), its really to help define _when the range stops being volatile_.

In your libc hack to use vrange(), you see the benfit of not immediately purging the memory as you do with MADV_DONTNEED. However, if the heap grows again, and those address are re-used, nothing has stopped those pages from continuing to be volatile. Thus the kernel could then decide to purge those pages after they start to be used again, and you'd lose data. I suspect that's not what you want. :)

Rik's MADV_FREE implementation is very similar to vrange(VOLATILE), but has an implicit vrange(NOVOLATILE) on any page write. So by dirtying a page, it stops the kernel from later purging it.

This MADV_FREE semantic works very well if you always want zerofill (as in the case of malloc/free). But for other data, its important to know something was lost (as a zero page could be valid data), and that's why we provide the SIGBUS, as well as the purged notification on vrange(NOVOLATILE).

In other-words, as long as you do a vrange(NOVOLATILE) when you grow the heap again (before its used), it should be very similar to the MADV_FREE behavior, but is more flexible for other use cases.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/