Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

From: David Hildenbrand
Date: Thu Oct 10 2019 - 14:59:43 EST


On 10.10.19 20:06, Michal Hocko wrote:
On Thu 10-10-19 13:48:06, Qian Cai wrote:
On Thu, 2019-10-10 at 19:30 +0200, Michal Hocko wrote:
On Thu 10-10-19 10:47:38, Qian Cai wrote:
On Thu, 2019-10-10 at 16:18 +0200, Michal Hocko wrote:
On Thu 10-10-19 09:11:52, Qian Cai wrote:
On Thu, 2019-10-10 at 12:59 +0200, Michal Hocko wrote:
On Thu 10-10-19 05:01:44, Qian Cai wrote:


On Oct 9, 2019, at 12:23 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:

If this was only about the memory offline code then I would agree. But
we are talking about any printk from the zone->lock context and that is
a bigger deal. Besides that it is quite natural that the printk code
should be more universal and allow to be also called from the MM
contexts as much as possible. If there is any really strong reason this
is not possible then it should be documented at least.

Where is the best place to document this? I am thinking about under
the âstruct zoneâ definitionâs lock field in mmzone.h.

I am not sure TBH and I do not think we have reached the state where
this would be the only way forward.

How about I revised the changelog to focus on memory offline rather than making
a rule that nobody should call printk() with zone->lock held?

If you are to remove the CONFIG_DEBUG_VM printk then I am all for it. I
am still not convinced that fiddling with dump_page in the isolation
code is justified though.

No, dump_page() there has to be fixed together for memory offline to be useful.
What's the other options it has here?

I would really prefer to not repeat myself
http://lkml.kernel.org/r/20191010074049.GD18412@xxxxxxxxxxxxxx

Care to elaborate what does that mean? I am confused on if you finally agree on
no printk() while held zone->lock or not. You said "If there is absolutely
no way around that then we might have to bite a bullet and consider some
of MM locks a land of no printk." which makes me think you agreed, but your
stance from the last reply seems you were opposite to it.

I really do mean that the first step is to remove the dependency from
the printk and remove any allocation from the console callbacks. If that
turns out to be infeasible then we have to bite the bullet and think of
a way to drop all printks from all locks that participate in an atomic
allocation requests.


I second that and dropping the useless printk() as Michal mentioned. I would beg to not uglify the offlining/isolation code with __nolock variants or dropping locks somewhere down in a function. If everything fails, I rather want to see the prinkt's gone or returning details in a struct back to the caller, that can print it instead.

e.g.,

struct unmovable_page_info {
const char *reason;
struct page *page;
...
};

You should get the idea.

--

Thanks,

David / dhildenb