Re: [PATCH 2/5] kernel.h: Add non_block_start/end()

From: Michal Hocko
Date: Fri Aug 16 2019 - 04:24:34 EST

On Thu 15-08-19 15:15:09, Andrew Morton wrote:
> On Thu, 15 Aug 2019 10:44:29 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > I continue to struggle with this. It introduces a new kernel state
> > > "running preemptibly but must not call schedule()". How does this make
> > > any sense?
> > >
> > > Perhaps a much, much more detailed description of the oom_reaper
> > > situation would help out.
> >
> > The primary point here is that there is a demand of non blockable mmu
> > notifiers to be called when the oom reaper tears down the address space.
> > As the oom reaper is the primary guarantee of the oom handling forward
> > progress it cannot be blocked on anything that might depend on blockable
> > memory allocations. These are not really easy to track because they
> > might be indirect - e.g. notifier blocks on a lock which other context
> > holds while allocating memory or waiting for a flusher that needs memory
> > to perform its work. If such a blocking state happens that we can end up
> > in a silent hang with an unusable machine.
> > Now we hope for reasonable implementations of mmu notifiers (strong
> > words I know ;) and this should be relatively simple and effective catch
> > all tool to detect something suspicious is going on.
> >
> > Does that make the situation more clear?
> Yes, thanks, much. Maybe a code comment along the lines of
> This is on behalf of the oom reaper, specifically when it is
> calling the mmu notifiers. The problem is that if the notifier were
> to block on, for example, mutex_lock() and if the process which holds
> that mutex were to perform a sleeping memory allocation, the oom
> reaper is now blocked on completion of that memory allocation.

reaper is now blocked on completion of that memory allocation
and cannot provide the guarantee of the OOM forward progress.


> btw, do we need task_struct.non_block_count? Perhaps the oom reaper
> thread could set a new PF_NONBLOCK (which would be more general than
> PF_OOM_REAPER). If we run out of PF_ flags, use (current == oom_reaper_th).

Well, I do not have a strong opinion here. A simple check for the value
seems to be trivial. There are quite some holes in task_struct to hide
this counter without increasing the size.
Michal Hocko