Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

From: Michal Hocko
Date: Thu Oct 10 2019 - 14:06:32 EST


On Thu 10-10-19 13:48:06, Qian Cai wrote:
> On Thu, 2019-10-10 at 19:30 +0200, Michal Hocko wrote:
> > On Thu 10-10-19 10:47:38, Qian Cai wrote:
> > > On Thu, 2019-10-10 at 16:18 +0200, Michal Hocko wrote:
> > > > On Thu 10-10-19 09:11:52, Qian Cai wrote:
> > > > > On Thu, 2019-10-10 at 12:59 +0200, Michal Hocko wrote:
> > > > > > On Thu 10-10-19 05:01:44, Qian Cai wrote:
> > > > > > >
> > > > > > >
> > > > > > > > On Oct 9, 2019, at 12:23 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > If this was only about the memory offline code then I would agree. But
> > > > > > > > we are talking about any printk from the zone->lock context and that is
> > > > > > > > a bigger deal. Besides that it is quite natural that the printk code
> > > > > > > > should be more universal and allow to be also called from the MM
> > > > > > > > contexts as much as possible. If there is any really strong reason this
> > > > > > > > is not possible then it should be documented at least.
> > > > > > >
> > > > > > > Where is the best place to document this? I am thinking about under
> > > > > > > the âstruct zoneâ definitionâs lock field in mmzone.h.
> > > > > >
> > > > > > I am not sure TBH and I do not think we have reached the state where
> > > > > > this would be the only way forward.
> > > > >
> > > > > How about I revised the changelog to focus on memory offline rather than making
> > > > > a rule that nobody should call printk() with zone->lock held?
> > > >
> > > > If you are to remove the CONFIG_DEBUG_VM printk then I am all for it. I
> > > > am still not convinced that fiddling with dump_page in the isolation
> > > > code is justified though.
> > >
> > > No, dump_page() there has to be fixed together for memory offline to be useful.
> > > What's the other options it has here?
> >
> > I would really prefer to not repeat myself
> > http://lkml.kernel.org/r/20191010074049.GD18412@xxxxxxxxxxxxxx
>
> Care to elaborate what does that mean? I am confused on if you finally agree on
> no printk() while held zone->lock or not. You said "If there is absolutely
> no way around that then we might have to bite a bullet and consider some
> of MM locks a land of no printk." which makes me think you agreed, but your
> stance from the last reply seems you were opposite to it.

I really do mean that the first step is to remove the dependency from
the printk and remove any allocation from the console callbacks. If that
turns out to be infeasible then we have to bite the bullet and think of
a way to drop all printks from all locks that participate in an atomic
allocation requests.
--
Michal Hocko
SUSE Labs