Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever

From: Michal Hocko
Date: Fri Jun 30 2017 - 12:19:17 EST


On Sat 01-07-17 00:59:56, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 30-06-17 09:14:22, Tetsuo Handa wrote:
> > [...]
> > > Ping? Ping? When are we going to apply this patch or watchdog patch?
> > > This problem occurs with not so insane stress like shown below.
> > > I can't test almost OOM situation because test likely falls into either
> > > printk() v.s. oom_lock lockup problem or this too_many_isolated() problem.
> >
> > So you are saying that the patch fixes this issue. Do I understand you
> > corretly? And you do not see any other negative side effectes with it
> > applied?
>
> I hit this problem using http://lkml.kernel.org/r/20170626130346.26314-1-mhocko@xxxxxxxxxx
> on next-20170628. We won't be able to test whether the patch fixes this issue without
> seeing any other negative side effects without sending this patch to linux-next.git.
> But at least we know that even this patch is sent to linux-next.git, we will still see
> bugs like http://lkml.kernel.org/r/201703031948.CHJ81278.VOHSFFFOOLJQMt@xxxxxxxxxxxxxxxxxxx .

It is really hard to pursue this half solution when there is no clear
indication it helps in your testing. So could you try to test with only
this patch on top of the current linux-next tree (or Linus tree) and see
if you can reproduce the problem?

It is possible that there are other potential problems but we at least
need to know whether it is worth going with the patch now.

[...]
> > Rik, Johannes what do you think? Should we go with the simpler approach
> > for now and think of a better plan longterm?
>
> I don't hurry if we can check using watchdog whether this problem is occurring
> in the real world. I have to test corner cases because watchdog is missing.
>
> Watchdog does not introduce negative side effects, will avoid soft lockups like
> http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@xxxxxxxxxxxxxx ,
> will avoid console_unlock() v.s. oom_lock mutext lockups due to warn_alloc(),
> will catch similar bugs which people are failing to reproduce.

this way of pushing your patch is really annoying. Please do realize
that repeating the same thing all around will not make a patch more
likely to merge. You have proposed something, nobody has nacked it
so it waits for people to actually find it important enough to justify
the additional code. So please stop this.

I really do appreciate your testing because it uncovers corner cases
most people do not test for and we can actually make the code better in
the end.
--
Michal Hocko
SUSE Labs