Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

From: Shaohua Li
Date: Wed Nov 04 2015 - 12:53:59 EST


On Tue, Nov 03, 2015 at 09:52:23AM +0900, Minchan Kim wrote:
> On Fri, Oct 30, 2015 at 10:22:12AM -0700, Shaohua Li wrote:
> > On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote:
> > > MADV_FREE is a hint that it's okay to discard pages if there is memory
> > > pressure and we use reclaimers(ie, kswapd and direct reclaim) to free them
> > > so there is no value keeping them in the active anonymous LRU so this
> > > patch moves them to inactive LRU list's head.
> > >
> > > This means that MADV_FREE-ed pages which were living on the inactive list
> > > are reclaimed first because they are more likely to be cold rather than
> > > recently active pages.
> > >
> > > An arguable issue for the approach would be whether we should put the page
> > > to the head or tail of the inactive list. I chose head because the kernel
> > > cannot make sure it's really cold or warm for every MADV_FREE usecase but
> > > at least we know it's not *hot*, so landing of inactive head would be a
> > > comprimise for various usecases.
> > >
> > > This fixes suboptimal behavior of MADV_FREE when pages living on the
> > > active list will sit there for a long time even under memory pressure
> > > while the inactive list is reclaimed heavily. This basically breaks the
> > > whole purpose of using MADV_FREE to help the system to free memory which
> > > is might not be used.
> >
> > My main concern is the policy how we should treat the FREE pages. Moving it to
> > inactive lru is definitionly a good start, I'm wondering if it's enough. The
> > MADV_FREE increases memory pressure and cause unnecessary reclaim because of
> > the lazy memory free. While MADV_FREE is intended to be a better replacement of
> > MADV_DONTNEED, MADV_DONTNEED doesn't have the memory pressure issue as it free
> > memory immediately. So I hope the MADV_FREE doesn't have impact on memory
> > pressure too. I'm thinking of adding an extra lru list and wartermark for this
> > to make sure FREE pages can be freed before system wide page reclaim. As you
> > said, this is arguable, but I hope we can discuss about this issue more.
>
> Yes, it's arguble. ;-)
>
> It seems the divergence comes from MADV_FREE is *replacement* of MADV_DONTNEED.
> But I don't think so. If we could discard MADV_FREEed page *anytime*, I agree
> but it's not true because the page would be dirty state when VM want to reclaim.

There certainly are other usage cases, but even your patch log mainly describes
the jemalloc usage case, which uses MADV_DONTNEED.

> I'm also against with your's suggestion which let's discard FREEed page before
> system wide page reclaim because system would have lots of clean cold page
> caches or anonymous pages. In such case, reclaiming of them would be better.
> Yeb, it's really workload-dependent so we might need some heuristic which is
> normally what we want to avoid.
>
> Having said that, I agree with you we could do better than the deactivation
> and frankly speaking, I'm thinking of another LRU list(e.g. tentatively named
> "ezreclaim LRU list"). What I have in mind is to age (anon|file|ez)
> fairly. IOW, I want to percolate ez-LRU list reclaiming into get_scan_count.
> When the MADV_FREE is called, we could move hinted pages from anon-LRU to
> ez-LRU and then If VM find to not be able to discard a page in ez-LRU,
> it could promote it to acive-anon-LRU which would be very natural aging
> concept because it mean someone touches the page recenlty.
>
> With that, I don't want to bias one side and don't want to add some knob for
> tuning the heuristic but let's rely on common fair aging scheme of VM.
>
> Another bonus with new LRU list is we could support MADV_FREE on swapless
> system.
>
> >
> > Or do you want to push this first and address the policy issue later?
>
> I believe adding new LRU list would be controversial(ie, not trivial)
> for maintainer POV even though code wouldn't be complicated.
> So, I want to see problems in *real practice*, not any theoritical
> test program before diving into that.
> To see such voice of request, we should release the syscall.
> So, I want to push this first.

The memory pressure issue isn't just in artificial test. In jemalloc, there is
a knob (lg_dirty_mult) to control the rate memory should be purged (using
MADV_DONTNEED). We already had several reports in our production environment
changing the knob can cause extra memory usage (and swap and so on). If
jemalloc uses MADV_FREE, jemalloc will not purge any memory, which is equivent
to disable current MADV_DONTNEED (eg, lg_dirty_mult = -1). I'm sure this will
cause the similar issue, eg (extram memory usage, swap). That said I don't
object to push this first, but the memory pressue issue can happen in real
production, I hope it's not ignored.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/