Re: Kernel messages, I18N, etc. (Was Re: [PATCH] modify console_loglevel from commandline)

Robert Minichino (rmini@joni.pasture.net)
Thu, 10 Dec 1998 17:11:59 -0500 (EST)


> Nothing short of massive grunge work from my view.
> There are at least a couple dozen ways of spelling "printk"
> scattered about the kernel. Some of them actually use
> the KERN_lvl macros, but very few really do.
> Many of the wrappers around the "printk()" invocations
> imply KERN_DEBUG, but they do not add that to the format strings.
> Usually, those debug messages are compiled out unless
> some locally defined variation of "#define DEBUG" is defined.
> Basically, you must go read the subsystem code if you want
> to figure out how to enable that subsystem's debug events.

I've already done enough digging on my own to figure this out quite
some time ago; nothing that a good session of find and grep can't fix. :)

> The one I did years ago was to implement a consistent three level
> throttling mechanism that worked the same way across all subsystems.
> Gradually, the subsystems were going to be switched to it.
> The trick was that every event needed tagging of both severity
> level and subsystem-of-origin. No trivial solution jumps out
> at me.

Elegant doesn't mean easy. :)

> > I think presentation-modifying tags to printk() might be nice too. That
> > way printk() could present messages in an easier to read (prettier?)
> > format. I wouldn't mind the following:
> >
> > [[nor would I. Remember there are around 20,000 printk's....]]

Yes, but I'm crazy. Remember, I'm writing this fully-aware of the count,
breadth, and scope of the printk() usage in the kernel. ;)

> > On the topic of kernel internationalization, there is a simple solution
> > (IMO), but it would result in an increase in kernel size for the
> > non-english speaking users. Have printk() take some sort of checksum of
> > the message, attempt to look it up in a translation table (which, by
> > default, wouldn't be there), and if a translation is found,

> And if a character in the format string changes (some of which are
> computed at compile time)...It is also very inconvenient to
> build, by hand, a table of formatting strings that are associated with
> a check sum.

The first step in doing this would be to find all the internationalizable
printk()s and use a special (unique) macro invocation to tag them for later
script processing. A suite of scripts would need to be developed to aid in
finding strings that could be internationalized, changing the formatting
string and updating the checksum value in the I18N tables (this wouldn't
be urgent in a developmental kernel, as the message would simply be printed
out in english), etc.

> That is precisely my proposal, except that to make the
> format strings findable, they have to have some
> regularized formatting around them. My program only
> finds about 95% of them. Any proper solution needs to be 100%.
> That is where hand work comes in. That is where I balk unless
> there is some reason to believe the work would get accepted.

Likewise. I'm willing to do any amount of work to see this through, but
I'm not going to do it if no one else wants to see it done. I'm happy
enough being able to tune the bootup console_loglevel in that case :)

> First, errors and critical events have to be identified.
> There are an awful lot of very innocuous looking printk's
> that lead straight up to a call to "halt()". :(

For sure. Either way you look at it there needs to be a pass over every
line of source code, by hand. Ick. :) But I'm willing to put my code where
my mouth is, so I don't think this is a big issue.

> > identification. A script could be provided to convert messages back into
> > English for this list, the maintainer, etc. Left/right, right/left, and
> > alternating issues should be dealt with in the console, with the default
> > possibly set by an I18N option.
>
> Translations of a formatted string is a very hard problem.
> You really cannot tell which parts came from the format string
> and which were formatted values. One of the reasons I advocate
> a binary log *in addition to* the traditional text one.

I don't see how the binary log would help over a text one, assuming the
kernel syslog messages are output tagged with message identifiers.

> > Some sort of extremely simple, logical, unintrusive, easy-to-remember
> > printk() guidelines would be quite useful IMO. It seems as if there's no
> > consistent usage of it through the kernel, thus no consistent way to
> > control the output. Just because it's text doesn't mean it HAS to be
> > ugly. ;) And all the pedants will be happy (MB, not Mb, mb, mB) too.
>
> Amen.

I think this is an absolute starting point, if there is any interest
whatsoever. No more being told I have 64 millibits of RAM. :)

> > I'm willing to do it or at least start it, but is it worth the effort? I
> > think so. Any comments anyone?
>
> It is worth the effort if there is reasonable expectation of adoption.

Exactly. Rephrasing my original question to be more clear, "Is there
reasonable expectation of adoption?" I can devote time almost immediately
to this if there is enough interest either from maintainers, or users, and
preferably both. I do believe a prudent plan would be as follows:

1) Draw up aforementioned printk() guidelines, toss around for reactions.
2) Find all printk()s in kernel, tag them with macros based on type of
message; tidy up messages according to guidelines.
3) Write scripts to parse said printk()s, manipulate files of them, etc.
4) 'Fix' vsprintf.c to allow positional parameters
5) Implement additional printk() functionality: pretty formatting, unlogged
messages (do copyright messages need to be logged? ;) ), etc.
6) Draw up translation table API (simple).
7) Attempt internationalization to 'redneck'. Why redneck? The kernel
messages are already in english, and I would assume that the majority of
english-speaking individuals can understand it with little difficulty. It is
also different enough from 'proper' english such that we can tell which
messages are not being printed from the I18N table.
8) At this point we have a clearly defined way to implement kernel output in
the future. Various internationalization attempts start. User-level
utilities need to be written to translate logged messages back into english
for mailing-list, newsgroup, etc. postings.

1,6 are best done by having a draft document drawn up, and it being thrown
around for input, modification, additional insights, etc. 2, 3 could be
done by any number of people, so long as there is coordination to prevent
duplication of labour. 4, 5 are quick tasks. 7, 8 can be done by any
individual with enough spare time, or a close-knit group.

I see this as a worthwhile project that conflicts with few other
development tasks. These changes could be effected against a stable
kernel without compromising its stability. These two factors mean that
the completion of these changes can be one more or less independent of
any development/stable kernel releases. I for one would like to see this
happen, and I'm eager to start. But I won't until I have at least some
assurance that these changes will be adopted in some mainstream kernel
release (stable, developmental, doesn't matter.) in the future.

Advantages of adoption: more elegant means of kernel output, recognizing
that not all kernel output is important; presentation of kernel messages
upon bootup to the console in a tidy, consistent, professional format,
thereby improving the image of the kernel; increased acceptance by
individuals that prefer kernel messages be in their native tongue, but
without inconveniencing developers while adding messages nor when reading
bug reports; formalization of kernel message format to improve consistency,
possibly clearer output.

Disadvantages: simple, yet tedious task; possibility of annoyed
maintainers WRT internationalized bug reports; internationalized kernels
are larger; userland utilities that read logfiles may be confused.

The first disadvantage cannot be avoided, but the task is easily
parallelizable as illustrated above. The second can be alleviated by
shipping the kernel with a bug-reporting script, or having syslog log
both english and internationalized messages (idea!). Actually, the
bilingual logging idea would solve the logfile reader problem too. Hmm...

I'm very eager to hear input about this, please respond if you have even a
shred of interest in this happening (or not happening), or are willing to
lend a hand. Any comments on (dis)advantages I may have overlooked?

--
Robert Minichino
Chief Engineer
Denarius Enterprises, Inc.
http://www.denarius.com/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/