Re: Internationalizing Linux

John R. Lenton (lenton@famaf.fis.uncor.edu)
Sat, 05 Dec 1998 13:14:16 -0300


I'll try and summarize (and add a few extra thoughts):

+ Nobody wants to have to understand yet another [dozen]
language[s] just to help Linux.

This works both ways: an english programmer shouldn't
have to try and decipher a bug report in spanish, and
a programmer from Maldives shouldn't have to learn
english to contribute. This has been the case so far,
but it isn't really the way to go. Or is it? That is to
say, either we have status quo and people who already
know english will Linux and people who don't won't, or
we change stuff around and gain a wider target. This is

+ For linux to be widely accepted beyond US and EU,
there should be a way to translate the
different messages into something intelligible
by the targeted end user.

There are three ways of doing this: on the fly using
LANG= and some trick on klog, a plain translated klog,
or in the kernel. The first is the most flexible (per
user) but also the slowest and most difficult. The
first two don't offer translation of boot messages
nor panics. The last two don't offer customization per
user but only per machine.
If we do things right, said end user will be able to
explain to his sysadmin what's going on (using a report
form for the system or however), the sysadmin will then
take the *code* of the error message and look it up in
english, and report that error. This could even be done
by the end user if (s)he is the sysadmin at the same
time. This is

o There needs to be some standard message identifier if
this is going to go at all.

What's more, the only thing in the kernel could be the
identifier, and the scripts that translate should cope
with this. This would take off an extra [ ]k of the
kernel (there's some argument as to how much, probably
because it depends what's in the kernel in the 1st
place).

However, there is one big disadvantage:

- Many maintainers have said they will not be bothered
with this. "It'll be a maintainer's nightmare"
seems to be the main reaction.

It needn't be. Because the maintainer or programmer or
developer who actually writes the important stuff that
gets the job done would just write the messages as a
unique code (based on...?), and include in a separate
file the messages in his own language. Be that whatever,
there will always be a guy/girl who know that language
and english and linux, and who can translate it into
english, and then the code can be internationalized.
Or into spanish, then english. This isn't literature so
we won't be loosing cadenzas on each translation :).

[Are you going to have bad dreams because of this? I
hope not.]

Let me give you an example: José has Linux on his
machine, and one day it boots up pretty much the same as
usual, except that it suddenly says

[kP:JUA:Az0Q]
El kernel entró en pánico: JUA: Imposible hacer café con
la tostadora!

So he runs to the sysadmin/guru/whatnot, who
a) recognizes the error and solves it;
b) doesn't recocknize the error, looks up
kP:JUA:Az0Q, sees it's a "Kernel panic: JUA: Unable to
make EVENT with DEVICE", and solves it;
c) b) but doesn't solve it, rather he askas the
guys at linux-kernel what should be done but using i)
the message and ii) the message code. So Pitr could take
the code, translate it into russian, remember he had a
similar problem, and describe his solution. (Although
Pitr would recocknize the error from the code, because he
compiled his kernel with no language messages, and he
remebers getting [kp:JUA:Az0Q]:Toast:CDBurner)

Also note that the script to translate from codes to
messages could be version-specific (becuase it goes with
the source), but that's probably not a good idea.

Anything I left out?

Yes: A very first-ever way of making the unique
identifiers:

-Start off with a char for the flavour of the message.
In the example I used "kp" for kernel panic, but I don't
think we have all that many flavours so just a "p"
would've been enough.
-Next, an standarizerd acronym for a part of the kernel
(as in VFS, MM, or such), and initials for the person
who's fault it is (lt, for example).
-Next a sequential code for each developer, or a code
that the developer chooses, possibly wrt the file the
message comes from, etc. All this could be done safely
with 4 or maybe at most 5 more chars, using numbers,
uppercase and lowercase ASCII (to make sure everybody
sees the same thing for now).

So, for example, the 4th message from the top of
drivers/cdrom/mcd.c by Martin Harriss (I believe) could
be something like D:CD:mh:mcd4, and the :'s are optional.
I.e. it could be DCDmhmcd4, and mh could have something
like 64 printk's in that file without having to change
to DCDmhMcd#, DCDmhmCD#, etc. Of course if there's
a Martin Humperblink... maybe we could use 3 letters
for names? Odds are there will be more than 1 person
with a pair of letters and a pair of letters with no
person, but that's life. Or statistics.

This isn't the most compact way of assigning unique
identifiers, but we probably don't want that anyway: if
this gets going, there will always be people like Pitr
who know the codes by heart, but if done this way it
won't be necessary because you'll *know* where the
message is from.

Phew! this is much too long. I hope you're still awake.

Cheers,

John.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/