Re: [PATCH 1/2 v3] mm: vmscan: do not pass reclaimed slab to vmpressure

From: Michal Hocko
Date: Thu Feb 02 2017 - 11:01:54 EST


On Thu 02-02-17 21:00:10, vinayak menon wrote:
> On Thu, Feb 2, 2017 at 5:22 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Thu 02-02-17 16:55:49, vinayak menon wrote:
> >> On Thu, Feb 2, 2017 at 4:18 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> > On Thu 02-02-17 11:44:22, Michal Hocko wrote:
> >> >> On Tue 31-01-17 14:32:08, Vinayak Menon wrote:
> >> >> > During global reclaim, the nr_reclaimed passed to vmpressure
> >> >> > includes the pages reclaimed from slab. But the corresponding
> >> >> > scanned slab pages is not passed. This can cause total reclaimed
> >> >> > pages to be greater than scanned, causing an unsigned underflow
> >> >> > in vmpressure resulting in a critical event being sent to root
> >> >> > cgroup. So do not consider reclaimed slab pages for vmpressure
> >> >> > calculation. The reclaimed pages from slab can be excluded because
> >> >> > the freeing of a page by slab shrinking depends on each slab's
> >> >> > object population, making the cost model (i.e. scan:free) different
> >> >> > from that of LRU.
> >> >>
> >> >> This might be true but what happens if the slab reclaim contributes
> >> >> significantly to the overal reclaim? This would be quite rare but not
> >> >> impossible.
> >> >>
> >> >> I am wondering why we cannot simply make cap nr_reclaimed to nr_scanned
> >> >> and be done with this all? Sure it will be imprecise but the same will
> >> >> be true with this approach.
> >>
> >> Thinking of a case where 100 LRU pages were scanned and only 10 were
> >> reclaimed. Now, say slab reclaimed 100 pages and we have no idea
> >> how many were scanned. The actual vmpressure of 90 will now be 0
> >> because of the addition on 100 slab pages. So underflow was not the
> >> only issue, but incorrect vmpressure.
> >
> > Is this actually a problem. The end result - enough pages being
> > reclaimed should matter, no?
> >
>
> But vmpressure is incorrect now, no ?

What does it mean incorrect? vmpressure is just an approximation that
tells us how much we struggle to reclaim memory. If we are making a
progress then we shouldn't reach higher levels.

> Because the scanned slab pages
> is not included in nr_scanned (the cost). The 100 scanned and 10
> reclaimed from LRU were a reasonable estimate as you said, and to that
> we are adding a reclaimed value alone without scanned and thus making
> it incorrect ? Because the cost of slab reclaim is not accounted.

there are other costs which are not included. E.g. stalling because of
dirty pages etc...

> But
> I agree that the vmpressure value would have been more correct if it
> could include both scanned and reclaimed from slab. And may be more
> correct if we can include the scanned and reclaimed from all shrinkers
> which I think is not the case right now (lowmemorykiller, zsmalloc
> etc). But as Minchan was pointing out, since the cost model for slab
> is different, would it be fine to just add reclaimed from slab to
> vmpressure ?

Get back to your example. Do you really prefer seeing an alarm just
because we had hard time reclaiming LRU pages which might be pinned due
to reclaimable slab pages (e.g. fs metadata) when the slab reclaim can
free enough of them?

vmpressure never had a good semantic, it is just an approximation that
happened to work for some workloads which it was proposed for.

[...]
> >> Our
> >> internal tests on Android actually shows the problem. When vmpressure
> >> with slab reclaimed added is used to kill tasks, it does not kick in
> >> at the right time.
> >
> > With the skewed reclaimed? How that happens? Could you elaborate more?
>
> Yes. Because of the skewed reclaim. The observation is that the vmpressure
> critical events are received late. Because of adding slab reclaimed without
> corresponding scanned, the vmpressure values are diluted resulting in lesser
> number of critical events at the beginning, resulting in tasks not
> being chosen to be killed.

Why would you like to chose and kill a task when the slab reclaim can
still make sufficient progres? Are you sure that the slab contribution
to the stats makes all the above happening?

> This increases the memory pressure and
> finally result in late critical events, but by that time the task
> launch latencies are impacted.

I have seen vmpressure hitting critical events really quickly but that
is mostly because the vmpressure uses only very simplistic
approximation. Usually the reclaim goes well, until you hit to dirty
or pinned pages. Then it can get really bad, so you can get from high
effectiveness to 0 pretty quickly.
--
Michal Hocko
SUSE Labs