Re: Found the commit that causes the OOMs
From: Minchan Kim
Date: Sun Jun 28 2009 - 12:47:43 EST
On Sun, Jun 28, 2009 at 11:49 PM, KOSAKI
Motohiro<kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
>>> In David's OOM case, there are two symptoms:
>>> 1) 70000 unaccounted/leaked pages as found by Andrew
>>> Â (plus rather big number of PG_buddy and pagetable pages)
>>> 2) almost zero active_file/inactive_file; small inactive_anon;
>>> Â many slab and active_anon pages.
>>>
>>> In the situation of (2), the slab cache is _under_ scanned. So David
>>> got OOM when vmscan should have squeezed some free pages from the slab
>>> cache. Which is one important side effect of MinChan's patch?
>>
>> My patch's side effect is (2).
>>
>> My guessing is following as.
>>
>> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
>> And it is doubled for mapped page or swapcache.
>> 2. shrink_page_list is called by shrink_inactive_list
>> 3. shrink_inactive_list is called by shrink_list
>>
>> Look at the shrink_list.
>> If inactive lru list is low, it always call shrink_active_list not
>> shrink_inactive_list in case of anon.
>> It means it doesn't increased sc->nr_scanned.
>> Then shrink_slab can't shrink enough slab pages.
>> So, David OOM have a lot of slab pages and active anon pages.
>>
>> Does it make sense ?
>> If it make sense, we have to change shrink_slab's pressure method.
>> What do you think ?
>
> I'm confused.
>
> if system have no swap, get_scan_ratio() always return anon=0%.
> Then, the numver of inactive_anon is not effect to sc.nr_scanned.
>
My patch isn't a concern since the number of anon lru list(active +
anon) always same. I mean shrink_slab's lru_pages is same whether my
patch there is. OOM or Pass depends on sc->nr_scanned, I think.
Why I think it is my patch's side effect is follow as.
Compared to old behavior, my patch can change balancing of anon lru
list when "swap file" is full as Hannes already pointed me out.
It can affect reclaimable anon pages while David is going on swap test on LTP.
When swap file test is end, pages on swap file is inserted anon lru list, again.
My patch can change physical location of anon pages on ram compared to old.
>From now on, we have no swap file so that we can reclaim only file pages.
But we have missed one thing. lumpy reclaim!. (In fact, we should not
reclaim anon pages in no swap space. A few days ago, I sended patch
about this problem. http://patchwork.kernel.org/patch/32651/)
It can reclaim anon pages although we have no swap file.
But after all, shrink_page_list can't reclaim anon pages. But it
increases sc->nr_scanned.
So I think whether Shrink_slab can reclaim enough or not depends on
sc->nr_scanned.
David's problem is very subtle.
1. If lumpy picks up the anon pages, it can pass LTP since
sc->nr_scanned is increased.
2. If lumpy don't pick up the anon pages, it can meet OOM since
sc->nr_scanned is almost zero or very small.
Unfortunately, my patch seems to change physical location of pages on
ram compared to old so that it selects 2.
It's my imaginary novel.
Okay. I believe Wu's patch will solve David's problem.
David. Could you test with Wu's patch ?
--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/