Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
From: Michal Hocko
Date: Fri Feb 03 2017 - 09:41:22 EST
On Fri 03-02-17 19:57:39, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 30-01-17 09:55:46, Michal Hocko wrote:
> > > On Sun 29-01-17 00:27:27, Tetsuo Handa wrote:
> > [...]
> > > > Regarding [1], it helped avoiding the too_many_isolated() issue. I can't
> > > > tell whether it has any negative effect, but I got on the first trial that
> > > > all allocating threads are blocked on wait_for_completion() from flush_work()
> > > > in drain_all_pages() introduced by "mm, page_alloc: drain per-cpu pages from
> > > > workqueue context". There was no warn_alloc() stall warning message afterwords.
> > >
> > > That patch is buggy and there is a follow up [1] which is not sitting in the
> > > mmotm (and thus linux-next) yet. I didn't get to review it properly and
> > > I cannot say I would be too happy about using WQ from the page
> > > allocator. I believe even the follow up needs to have WQ_RECLAIM WQ.
> > >
> > > [1] http://lkml.kernel.org/r/20170125083038.rzb5f43nptmk7aed@xxxxxxxxxxxxxxxxxxx
> >
> > Did you get chance to test with this follow up patch? It would be
> > interesting to see whether OOM situation can still starve the waiter.
> > The current linux-next should contain this patch.
>
> So far I can't reproduce problems except two listed below (cond_resched() trap
> in printk() and IDLE priority trap are excluded from the list). But I agree that
> the follow up patch needs to use a WQ_RECLAIM WQ. It is theoretically possible
> that an allocation request which can trigger the OOM killer waits for the
> system_wq while there is already a work which is in system_wq which is looping
> forever inside the page allocator without triggering the OOM killer.
Well, this shouldn't happen AFAICS because a new worker would be
requested and that would certainly require a memory and that allocation
would trigger the OOM killer. On the other hand I agree that it would be
safer to not depend on memory allocation from within the page allocator.
> Maybe the follow up patch can share the vmstat WQ?
Yes, this would be an option.
--
Michal Hocko
SUSE Labs