Re: [PATCH] mm: warn about allocations which stall for too long

From: Tetsuo Handa
Date: Tue Sep 27 2016 - 08:58:00 EST


Michal Hocko wrote:
> > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > mutex in order to serialize warning messages because it is possible that
> > > > multiple allocation requests are stalling?
> > >
> > > we do not use any lock in warn_alloc_failed so why this should be any
> > > different?
> >
> > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > concurrent memory allocation failure messages make dmesg output
> > unreadable. But __warn_memalloc_stall() is called for only
> > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > sleep in order to serialize concurrent memory allocation stall
> > messages.
>
> I still do not see a point. A single line about the warning and locked
> dump_stack sounds sufficient to me.

printk() is slow operation. It is possible that two allocation requests
start within time period needed for completing warn_alloc_failed().
It is possible that multiple concurrent allocations are stalling when
one of them cannot be satisfied. The consequence is multiple concurrent
timeouts corrupting dmesg.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
(Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)

If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
no corruption.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

Deferring it when trylock() failed will be also possible.