Re: [PATCH 0/3] Volatile Ranges (v11)
From: Minchan Kim
Date: Thu Mar 20 2014 - 03:46:08 EST
On Wed, Mar 19, 2014 at 05:38:10PM -0700, Dave Hansen wrote:
> On 03/18/2014 05:24 AM, Michal Hocko wrote:
> > On Fri 14-03-14 11:33:30, John Stultz wrote:
> > [...]
> >> Volatile ranges provides a method for userland to inform the kernel that
> >> a range of memory is safe to discard (ie: can be regenerated) but
> >> userspace may want to try access it in the future. It can be thought of
> >> as similar to MADV_DONTNEED, but that the actual freeing of the memory
> >> is delayed and only done under memory pressure, and the user can try to
> >> cancel the action and be able to quickly access any unpurged pages. The
> >> idea originated from Android's ashmem, but I've since learned that other
> >> OSes provide similar functionality.
> > Maybe I have missed something (I've only glanced through the patches)
> > but it seems that marking a range volatile doesn't alter neither
> > reference bits nor position in the LRU. I thought that a volatile page
> > would be moved to the end of inactive LRU with the reference bit
> > dropped. Or is this expectation wrong and volatility is not supposed to
> > touch page aging?
> I'm not really convinced it should alter the aging. Things could
> potentially go in and out of volatile state frequently, and requiring
> aging means we've got to go after them page-by-page or pte-by-pte at
> best. That doesn't seem like something we want to do in a path we want
> to be fast.
Since vrange syscall design was changed from range-based to pte-based,
it shouldn't be fast. Sure, vrange(VOLAILTE) could be fast with just
mark it VMA_VOALTILE to vma->vm_flags but vrange(NOVOLATILE) should
look every pages in the range so it could be slow.
Even vrange(VOLATILE) call is fast now, I want to accout volatile
pages to expose it to the user by vmstat so that user could see
current status of the system memory, which makes userspace more happy
and predicatble. If we add such stat, vrange(VOLATILE) should look
every pages in the range so it could be slow, too.
> Why not just let normal page aging deal with them? It seems to me like
> like trying to infer intended lru position from volatility is the wrong
> thing. It's quite possible we'd have two pages in the same range that
> we want in completely different parts of the LRU. Maybe the structure
> has a hot page and a cold one, and we would ideally want the cold one
> swapped out and not the hot one.
Yes, it would be really arguble and it depends on the user's usecase.
That's why I'd like to add VRANGE_NORMAL_AGING which just don't move
the page in curret position of the LRU. It would be useful when it used
with VRANGE_SIGBUS because they could handle partial pages.
Otherwise, I'd like to move that pages into inacive's tail so that it
should prevent reclaiming of the hot pages.
If there is no memory pressure, we could get a chance to reuse volatile
pages so it could rotate back to the head of LRU when VM reclaim logic is
I agree with John's opinion that just make approach simple as possible
and extend it later so that we should make a room in syscall semantic
and make an agreement what should be default at the moment.
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/