Re: [RFC PATCH v1 20/25] serial: 8250: implement write_atomic

From: John Ogness
Date: Thu Mar 07 2019 - 23:05:45 EST


On 2019-02-27, Petr Mladek <pmladek@xxxxxxxx> wrote:
>>>> Implement a non-sleeping NMI-safe write_atomic console function in
>>>> order to support emergency printk messages.
>>>
>>> OK, it would be safe when prb_lock() is the only lock taken
>>> in the NMI handler.
>>
>> Which is the case. As I wrote to you already [0], NMI contexts are
>> _never_ allowed to do things that rely on waiting forever for other
>> CPUs.
>
> Who says _never_? I agree that it is not reasonable. But the
> history shows that it happens.

Right, which is why it would need to become policy.

The emergency messages (aka write_atomic) introduce a new requirement to
the kernel because this callback must be callable from any context. The
console drivers must have some way of synchronizing. The CPU-reentrant
spin lock is the only solution I am aware of.

> In principle, there is nothing wrong in using spinlock in NMI
> when it is used only in NMI.

The CPU-reentrant spin lock _will_ be used in NMI context and
potentially could be used from any line of NMI code (if, for example, a
panic is triggered). The problem is when you have 2 different spin locks
in NMI context and their ordering cannot be guaranteed. And since I am
introducing an implicit spin lock that potentially could be locked from
any line of code, any explicit use of a spin lock in NMI could would
really be adding a 2nd spin lock and thus deadlock potential.

If the ringbuffer was fully lockless, we should be able to have
per-console CPU-reentrant spin locks as long as the ordering is
preserved, which I expect shouldn't be a problem. If any NMI context
needed a spin lock for its own purposes, it would need to use the
CPU-reentrant spin lock of the first console so as to preserve the
ordering in case of a panic.

>>> 2. I am afraid that we need to add some locking between CPUs
>>> to avoid mixing characters from directly printed messages.
>>
>> That is exactly what console_atomic_lock() (actually prb_lock) is!
>
> Sure. But it should not be a common lock for the ring buffer and
> all consoles.

As long as the ring buffer requires a CPU-reentrant spin lock, I expect
that it _must_ be a common lock for all. Consider the situation that the
ring buffer writer code causes a panic. I think it is beneficial if at
least 1 level of printk recursion is supported so that even these
backtraces make it out on the emergency consoles.

If the ring buffer becomes fully lockless, then we could move to
per-console CPU-reentrant spin locks.

John Ogness