Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

From: Andi Kleen
Date: Mon Oct 26 2009 - 17:55:52 EST


On Mon, Oct 26, 2009 at 11:03:59AM -0700, Mike Travis wrote:
>
>
> Andi Kleen wrote:
>> Mike Travis <travis@xxxxxxx> writes:
>>
>>> With a large number of processors in a system there is an excessive amount
>>> of messages sent to the system console. It's estimated that with 4096
>>> processors in a system, and the console baudrate set to 56K, the startup
>>> messages will take about 84 minutes to clear the serial port.
>>>
>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>> output" (or 'lco' for short), which when set provides the ability to
>>> temporarily reduce the console loglevel during system startup. This allows
>>> informative messages to still be seen on the console without producing
>>> excessive amounts of repetious messages.
>>>
>>> Note that all the messages are still available in the kernel log buffer.
>>
>> I've run into the same problem (kernel log being flooded on large number of CPU thread
>> systems). It's definitely not a UV only problem. Making such a option UV only
>> is definitely not the right approach, if anything it needs to be for everyone.
>
> I could use something like the MAXSMP config option to enable it...?

No, it's a problem long before MAXSMP sizes.

>>
>> Frankly a lot of these messages made sense for debugging at some point,
>> but really don't anymore and should just be removed.
>
> That they still go to the kernel log buffer means the messages are still
> available for debugging system problems. KDB has a kernel print option if
> you end up there before being able to use 'dmesg'.

Again they should be just reevaluated and pr_debug()ed or completely
removed.

>
>>
>> Also I don't like the defaults of on. It would be better to evaluate if
>> these various messages are really useful and if they are not just remove them.
>
> I believe most distros already do that by setting the loglevel argument
> (but I could be wrong since I haven't looked at too many of them.)

Even spamming dmesg is a problem. loglevel doesn't fix that.

>
>>
>> For example do we really need the scheduler debug messages by default?
>
> This was the most painful message at Nasa (which has a 2k cpu system). It took
> well over an hour for these scheduler messages to print, just because we wanted
> to get some other DEBUG prints.

They should be just removed.

>>
>> Or do we really need to print the caches for each CPU at boot? The information
>> is in sysfs anyways and rarely changes (I added this originally on 64bit,
>> but in hindsight it was a bad idea)
>
> I was attempting not to decide whether each message was pertinent, only if it
> was redundant.

You should decide or at least ask whoever added it

("How many bugs did you fix with that message last year?" If the answer
is < 10 or so, remove it)
>
>>
>> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
>> for example.
>
> That would still be 4 to 12 thousand lines of information which, as you say is
> available by other means.

A simple checkpoint for debugging is not available by other means.

The cache, mce etc. information is.

For the checkpoint problem on CPU boot it might be reasonable
to write them into a special buffer and only print it when the other
CPU does not come up (BP detects a time out)

With that a single line of per CPU output should be feasible without
losing any debuggability.

In fact debuggability could be improved by putting the output
at better strategic points instead of the ad-hoc way it is currently.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/