Re: [PATCH] panic: release stale console lock to always get the logbuf printed out
From: Vitaly Kuznetsov
Date: Fri Oct 09 2015 - 06:10:03 EST
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
> On Thu, 08 Oct 2015 11:51:13 +0200 Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>
>> > On Wed, 7 Oct 2015 19:02:22 +0200 Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>> >
>> >> In some cases we may end up killing the CPU holding the console lock
>> >> while still having valuable data in logbuf. E.g. I'm observing the
>> >> following:
>> >> - A crash is happening on one CPU and console_unlock() is being called on
>> >> some other.
>> >> - console_unlock() tries to print out the buffer before releasing the lock
>> >> and on slow console it takes time.
>> >> - in the meanwhile crashing CPU does lots of printk()-s with valuable data
>> >> (which go to the logbuf) and sends IPIs to all other CPUs.
>> >> - console_unlock() finishes printing previous chunk and enables interrupts
>> >> before trying to print out the rest, the CPU catches the IPI and never
>> >> releases console lock.
>> >
>> > Why doesn't the lock-owning CPU release the console lock? Because it
>> > was stopped by smp_send_stop() in panic()?
>> >
>> > I don't recall why we stop CPUs in panic(), and of course we didn't
>> > document the reason. I guess it makes sense from the "what else can we
>> > do" point of view, but I wonder if we can just do it later on - that
>> > would fix this problem?
>>
>> We don't know for how long should we wait for the other CPU to finish
>> the output and it can take some time. In case we're rebooting after a
>> short timeout we can still end up with something in the logbuf.
>
> I don't understand what you're saying here.
>
> If we move panic()'s call to smp_send_stop() so it occurs later in
> panic(), won't this result in this CPU's messages being properly
> displayed?
If some other CPU is printing, for how long do we need to wait before we
try to stop it? It can take *any* amount of time to print out the buffer
-- we can even reboot the host earlier.
> The currently-printing CPU will still be running and all
> the printks will proceed in the normal fashion?
It will be running till we reboot the host, and we need to make sure
there is nothing in the buffer when we do that. I see only two viable
options: make sure the crashing cpu prints out the buffer before we
reboot (natural serialization) or some sort of lock-waiting to make sure
the printing CPU is done with its job.
--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/