Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

From: Jan Kara
Date: Thu Oct 22 2015 - 08:18:32 EST


On Thu 22-10-15 11:28:47, David Woodhouse wrote:
> On Thu, 2015-10-22 at 11:16 +0100, David Howells wrote:
> > printk() currently discards earlier messages to make space for new messages
> > arriving. This has the distinct downside that if the kernel starts
> > churning out messages because of some initial incident, the report of the
> > initial incident is likely to be lost under a blizzard of:
> >
> > ** NNN printk messages dropped **
> >
> > messages from console_unlock().
> >
> > The first message generated (typically an oops) is usually the most
> > important - the one you want to solve first - so we really want to see
> > that.
>
> But wait... didn't I watch you muttering on IRC about the actual bug
> you were trying to catch here... and didn't you have a *serial* console
> hooked up?
>
> What broke such that serial console stopped giving you *every* message?
>
> Serial console was always *synchronous*.

Not for last 10 years or even more... When there is only a single CPU
calling printk, you are right. But when two CPUs happen to enter printk
code, the first will do the printing and the second printk will return
immediately after appending message to the kernel log buffer and the
message will appear on console when the first CPU gets to it.

> We could do stuff like...
>
> printk("Going to do foo...\n");
> outb(foo, baz);
> printk("Did foo and the machine didn't catch fire! Now bar\n");
> outb(bar, baz);
> printk("Done\n");
>
> And with a serial console I could know *precisely* the point at which
> the machine locked up.
>
> And I could enable the silly debugging levels on things like JFFS2 and
> be sure that with a serial console I could catch *every* printk
> reliably â which led to a number of cases where people would reproduce
> a bug with a serial console and debugging, mail me a huge log file, and
> get a patch back in reply.
>
> We *need* to have a mode where serial console is actually *reliable*,
> and we can know that the message has been sent out the port before the
> printk() call returns.
>
> What happened to it? And how do we fix it?

Hard to fix since you'd easily get RCU stalls and softlockup messages on
systems with lots of CPUs and heavy printk traffic...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/