Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

From: Sergey Senozhatsky
Date: Mon Sep 23 2019 - 21:28:41 EST


On (09/23/19 14:58), Petr Mladek wrote:
>
> If I understand it correctly then this is the re-appearing problem.
> The only systematic solution with the current approach is to
> take port->lock in printk_safe/printk_deferred context.

It probably is.
We have a number of reverse paths. TTY invokes MM under port->lock,
UART invokes TTY under port->lock, MM invokes UART under zone->lock,
UART invokes sched under port->lock, shced invokes UART, UART invokes
UART under port->lock (!), and so on.

> But this is a massive change that almost nobody wants. Instead,
> we want the changes that were discussed on Plumbers.
>
> Now, the question is what to do with existing kernels. There were
> several lockdep reports. And I am a bit lost. Did anyone seen
> real deadlocks or just the lockdep reports?

I think so. Qian Cai mentioned "a hang" in one of his reports
(was it unseeded random()?). I'll send a formal patch maybe,
since there were no objections.

> To be clear. I would feel more comfortable when there are no
> deadlocks. But I also do not want to invest too much time
> into old kernels. All these problems were there for ages.
> We could finally see them because lockdep was enabled in printk()
> thanks to printk_safe.

True.
Everyone was so much happier when printk() used to do

lockdep_off();
call_console_drivers();
lockdep_on();

Now we can have lockdep and RCU checks enabled, yet somehow
printk_safe is still "a terrible thing" :)

-ss