Re: [PATCH V2] output the cpu number when printking.

From: Yanmin Zhang
Date: Mon Dec 24 2012 - 20:07:55 EST


On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
> On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> > From: "he, bo" <bo.he@xxxxxxxxx>
> >
> > We often hit kernel panic issues on SMP machines because processes race
> > on multiple cpu. By adding a new parameter printk.cpu, kernel prints
> > cpu number at printk information line. Itâs useful to debug what cpus
> > are racing.
>
> How useful is this really for normal developers?
It's very useful to debug race conditions under SMP environment.
We applied the patch to our Android build image on our smartphones.

> We complained when
> this option was proposed by the ARM developers who were, for the first
> time, handling more than one processor and the issues involved with
> that. You are enabling this as a default option, for all developers,
> and almost no one will ever need it.
At least, I am a developer to use it every day. :). I need debug various
weird issues and tried many debug tools.

Anyway, we would change the default to _disable_.

>
> So, if you really want this, don't enable this by default. Also, go
> back and read the old thread about this option and why it was rejeted
> previously.
Thanks for your kind reminder. I googled it. Is the link
like https://lkml.org/lkml/2012/8/3/121?
Or http://lkml.indiana.edu/hypermail/linux/kernel/1208.0/02097.html?

Above links raise a good question that we might be able to use
ftrace (trace_printk) instead of to extend printk.
1) There are indeed lots of debug methods/tools in kernel because of many
excellent developers' contributions. However, the most frequently-used tool
is still printk.
2) ftrace (I use it often) is good when system mostly doesn't crash.
If system crashes, we couldn't get ftrace info sometimes.
When we debug some hard issues on a multi-core device, the system often
crashes when Graphics driver accesses registers incorrectly. There is no chance
for kernel to dump ftrace or panic info. With printk, we find that's because
2 cpu are racing. One cpu turns off Graphics while another cpu is accessing it.
Of course, we also applied the Android persistent ram patch, so after rebooting,
we still could see the printk buffer of previous booting.
3) Current drivers usually have many dev_dbg statements. We could enable
dynamic debug control to get such info quickly. With the printk cpu number
patch, we quickly enhance them to dump info about potential races among cpu.
With ftrace, we need change all these dev_dbg to trace_printk and that's not
the original objective of driver developers.

Mostly, the patch is to reuse all current printk statements in kernel with
a little overhead.

Thanks for your kind comments.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/