Re: [PATCH 1/5] vmscan: remove all_unreclaimable check from directreclaim path completely

From: Minchan Kim
Date: Wed Mar 23 2011 - 02:59:26 EST

On Wed, Mar 23, 2011 at 2:21 PM, KOSAKI Motohiro
<kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> Hi Minchan,
>> > zone->all_unreclaimable and zone->pages_scanned are neigher atomic
>> > variables nor protected by lock. Therefore a zone can become a state
>> > of zone->page_scanned=0 and zone->all_unreclaimable=1. In this case,
>> Possible although it's very rare.
> Can you test by yourself andrey's case on x86 box? It seems
> reprodusable.
>> > current all_unreclaimable() return false even though
>> > zone->all_unreclaimabe=1.
>> The case is very rare since we reset zone->all_unreclaimabe to zero
>> right before resetting zone->page_scanned to zero.
>> But I admit it's possible.
> Please apply this patch and run oom-killer. You may see following
> pages_scanned:0 and all_unreclaimable:yes combination. likes below.
> (but you may need >30min)
> Â Â Â ÂNode 0 DMA free:4024kB min:40kB low:48kB high:60kB active_anon:11804kB
> Â Â Â Âinactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB
> Â Â Â Âisolated(anon):0kB isolated(file):0kB present:15676kB mlocked:0kB
> Â Â Â Âdirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> Â Â Â Âslab_unreclaimable:0kB kernel_stack:0kB pagetables:68kB unstable:0kB
> Â Â Â Âbounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
>> Â Â Â Â CPU 0 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â CPU 1
>> free_pcppages_bulk               Âbalance_pgdat
>> Â Â Â Â zone->all_unreclaimabe = 0
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â zone->all_unreclaimabe = 1
>> Â Â Â Â zone->pages_scanned = 0
>> >
>> > Is this ignorable minor issue? No. Unfortunatelly, x86 has very
>> > small dma zone and it become zone->all_unreclamble=1 easily. and
>> > if it becase all_unreclaimable, it never return all_unreclaimable=0
>> Â Â Â Â ^^^^^ it's very important verb. Â Â^^^^^ return? reset?
>> Â Â Â Â I can't understand your point due to the typo. Please correct the typo.
>> > beucase it typicall don't have reclaimable pages.
>> If DMA zone have very small reclaimable pages or zero reclaimable pages,
>> zone_reclaimable() can return false easily so all_unreclaimable() could return
>> true. Eventually oom-killer might works.
> The point is, vmscan has following all_unreclaimable check in several place.
> Â Â Â Â Â Â Â Â Â Â Â Âif (zone->all_unreclaimable && priority != DEF_PRIORITY)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âcontinue;
> But, if the zone has only a few lru pages, get_scan_count(DEF_PRIORITY) return
> {0, 0, 0, 0} array. It mean zone will never scan lru pages anymore. therefore
> false negative smaller pages_scanned can't be corrected.
> Then, false negative all_unreclaimable() also can't be corrected.
> btw, Why get_scan_count() return 0 instead 1? Why don't we round up?
> Git log says it is intentionally.
> Â Â Â Âcommit e0f79b8f1f3394bb344b7b83d6f121ac2af327de
> Â Â Â ÂAuthor: Johannes Weiner <hannes@xxxxxxxxxxxx>
> Â Â Â ÂDate: Â Sat Oct 18 20:26:55 2008 -0700
> Â Â Â Â Â Âvmscan: don't accumulate scan pressure on unrelated lists
>> In my test, I saw the livelock, too so apparently we have a problem.
>> I couldn't dig in it recently by another urgent my work.
>> I think you know root cause but the description in this patch isn't enough
>> for me to be persuaded.
>> Could you explain the root cause in detail?
> If you have an another fixing idea, please let me know. :)

Okay. I got it.

The problem is following as.
By the race the free_pcppages_bulk and balance_pgdat, it is possible
zone->all_unreclaimable = 1 and zone->pages_scanned = 0.
DMA zone have few LRU pages and in case of no-swap and big memory
pressure, there could be a just a page in inactive file list like your
example. (anon lru pages isn't important in case of non-swap system)
In such case, shrink_zones doesn't scan the page at all until priority
become 0 as get_scan_count does scan >>= priority(it's mostly zero).
And although priority become 0, nr_scan_try_batch returns zero until
saved pages become 32. So for scanning the page, at least, we need 32
times iteration of priority 12..0. If system has fork-bomb, it is
almost livelock.

If is is right, how about this?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 148c6e6..34983e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1973,6 +1973,9 @@ static void shrink_zones(int priority, struct
zonelist *zonelist,

static bool zone_reclaimable(struct zone *zone)
+ if (zone->all_unreclaimable)
+ return false;
return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;

Kind regards,
Minchan Kim
