On 1/10/2018 10:24 PM, Petr Mladek wrote:
From: Steven Rostedt <rostedt@xxxxxxxxxxx>
From: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
ÂÂÂÂif (console_trylock())
ÂÂÂÂÂÂÂ console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPU1
console_unlock()
ÂÂ console_owner = current;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ spin_lockA()
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ printk()
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ spin = true;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ while (...)
ÂÂÂÂ call_console_drivers()
ÂÂÂÂÂÂ spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
ÂÂ printk()
ÂÂÂÂ console_unlock()
ÂÂÂÂÂÂ call_console_drivers()
ÂÂÂÂspin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
Hello,
I didn't see what you did, at the last version. You were
tring to transfer the semaphore owner and make it taken
over. I see.
But, what I mentioned last time is still valid. See below.