Re: [PATCH] mm: vmscan: do not pass reclaimed slab to vmpressure

From: vinayak menon
Date: Thu Jan 26 2017 - 00:24:06 EST

Hi Minchan

On Thu, Jan 26, 2017 at 4:57 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
> Hello Vinayak,
> On Wed, Jan 25, 2017 at 05:08:38PM +0530, Vinayak Menon wrote:
>> It is noticed that during a global reclaim the memory
>> reclaimed via shrinking the slabs can sometimes result
>> in reclaimed pages being greater than the scanned pages
>> in shrink_node. When this is passed to vmpressure, the
> I don't know you are saying zsmalloc. Anyway, it's one of those which
> free larger pages than requested. I should fix that but was not sent
> yet, unfortunately.

As I understand, the problem is not related to a particular shrinker.
In shrink_node, when subtree's reclaim efficiency is passed to vmpressure,
the 4th parameter (sc->nr_scanned - nr_scanned) includes only the LRU
scanned pages, but the 5th parameter (sc->nr_reclaimed - nr_reclaimed) includes
the reclaimed slab pages also since in the previous step
"reclaimed_slab" is added
to it. i.e the slabs scanned are not included in scanned passed to vmpressure.
This results in reclaimed going higher than scanned in vmpressure resulting in
false events.

>> unsigned arithmetic results in the pressure value to be
>> huge, thus resulting in a critical event being sent to
>> root cgroup. Fix this by not passing the reclaimed slab
>> count to vmpressure, with the assumption that vmpressure
>> should show the actual pressure on LRU which is now
>> diluted by adding reclaimed slab without a corresponding
>> scanned value.
> I can't guess justfication of your assumption from the description.
> Why do we consider only LRU pages for vmpressure? Could you elaborate
> a bit?
When we encountered the false events from vmpressure, thought the problem
could be that slab scanned is not included in sc->nr_scanned, like it is done
for reclaimed. But later thought vmpressure works only on the scanned and
reclaimed from LRU. I can explain what I understand, let me know if this is
vmpressure is an index which tells the pressure on LRU, and thus an
indicator of thrashing. In shrink_node when we come out of the inner do-while
loop after shrinking the lruvec, the scanned and reclaimed corresponds to the
pressure felt on the LRUs which in turn indicates the pressure on VM. The
moment we add the slab reclaimed pages to the reclaimed, we dilute the
actual pressure felt on LRUs. When slab scanned/reclaimed is not included
in the vmpressure, the values will indicate the actual pressure and if there
were a lot of slab reclaimed pages it will result in lesser pressure
on LRUs in the next run which will again be indicated by vmpressure. i.e. the
pressure on LRUs indicate actual pressure on VM even if slab reclaimed is
not included. Moreover, what I understand from code is, the reclaimed_slab
includes only the inodesteals and the pages freed by slab allocator, and does
not include the pages reclaimed by other shrinkers like
lowmemorykiller, zsmalloc
etc. That means even now we are including only a subset of reclaimed pages
to vmpressure. Also, considering the case of a userspace lowmemorykiller
which works on vmpressure on root cgroup, if the slab reclaimed in included in
vmpressure, the lowmemorykiller will wait till most of the slab is
shrinked before
kicking in to kill a task. No ?