Re: Japanese (or other language) postings

From: Jim Dennis (jimd@linuxcare.com)
Date: Fri Jul 21 2000 - 12:45:22 EST


Excerpt From owner-linux-kernel-digest@vger.rutgers.edu:

> From: "Johan Kullstam" <kullstam@ne.mediaone.net>
> Date: 18 Jul 2000 17:33:37 -0400
 
> "Andrew van der Stock" <ajv@greebo.net> writes:
 
>> If you can...... read the following, your mailer (MUA) is
>> broken.

 My mailer is tolerant of those characters. However, they
 are gibberish given my current character set. I've replaced
 them with "." to prevent their further propagation.

 My inability to "read" those characters is not my MUAs fault.
 
> 1) i can *see* it. however i cannot *read* it. perhaps i am not
> fluent in baud barf.
> 2) octal 200-240 are not a valid char codes no matter how you slice it
> -- even if you are microsoft. they are, in particular, illegal in
> html.

 Agreed.
 
>> ............Andrew.........
 
>> With the original poster effectively saying that Linux (and I mean
>> the kernel, not just a distro) will not be internationalized due to
>> one or two spammers is wrong. International helpers (and generally
>> they are volunteers) have posted here before, and we should never
>> cut ourselves off from 70% of the world who do not speak English.
 
>> ASCII's time is coming to a close, and confusing two issues (spam vs
>> ignoring all .jp/double byte mails) is simply wrong (and skirting
>> damn close to racism).
 
> this is skirting damned close to a gratuitous godwin and the topic has
> nothing whatsoever to do with racism.

 I believe that Linus has specified that the kernel code, comments
 and primary documentation will be in English. Since his native
 languages are Swedish and Finnish (IIRC) I'd hardly consider this
 to have been a "racist" choice.

 Presumably Linus chose English from among his adopted languages,
 for practical reasons. Perhaps he felt that the Linux kernel was
 best served by choosing one language and he decided that English
 would make it available to the broadest possible audience. Perhaps
 he felt that the use of English would like garner the broadest
 contribution by other developers. Perhaps he just preferred to
 avoid the congnitive disonance that would result from a mixture of
 English derived keywords (particularly those that are defined and
 reserved by C) with identifiers and comments in other language.

 Regardless of his reasons, he chose English. It is a reasonable
 choice.

 It is my understanding that this mailing list is conducted in
 English by the consensus of its participants. Since the kernel is
 in English (C with English derived identifiers) it is logical that
 the mailing list and most other "official" business of the kernel
 developers team would follow suit.

 There are other customs on this mailing list which follow common
 conventions of traditional UNIX and related technical mailing lists
 and newsgroups. Those discourage the use of character sets other
 than 7-bit ASCII and encourage line lengths of ~ 72 characters or
 less. They also discourage "sigs" of more than ~4 lines, etc.
 There are some other customs and guidelines to this list; though
 none are formalized.

 To claim that this "verges on racism" is essentially to mandate
 the curse of Babel. It's impractical.
 
>> I'm not going into which MUA is best as I've
>> decided what works for me
 
> you might try one which terminates lines after 70 or characters.

 Clearly the correspondent here is unwilling to bow to
 the traditions and conventions of this list. Using
 character sets beyond the 7-bit clean ASCII, English, and
 keeping line length at less than 72 columns.

 It may be an artifact of imperialism and "obsolete" technologies
 but the 7-bit ASCII and accommodation for 80 column terminal
 widths are the customs of this (and most traditional UNIX/Linux
 and related) mailing lists and newsgroups.
 
>> and you'll be a partisan of another
>> choice, and there should always be that choice. The real issue is
>> that many MUA's are simply and unspeakably difficult wrt HTML (which
>> is how double byte people must communicate with any chance of it
>> being read somewhere else) and how they handle languages in
>> general. Spam is a different issue, and one that must be addressed
>> but separately to languages. This is an English language forum, and
>> substationally that should not change, but cutting off particular
>> language speakers just to avoid spam or faulty or substandard
>> mailers smacks of avoiding the real issue.
 
>> What happens if a Hebrew, Hindi or Russian language spammer (or more
>> likely, the English-language spammers that I've nearly universally
>> had in the 11 years I've had e-mail) mails here on a regular basis?
>> The problem of spam is separate to I18N, and should always be
>> considered so.
 
> however, afaict *all* of the japanese encoded text has been spam. or
> maybe not, since i cannot read it.
 
>> Otherwise, lkml will alienate international
>> developers. As Linus speaks Finnish as his first language
 
 I don't think that the choice of English has alienated any
 significant number of developers. The fact is that Linux is
 very widespread in Japan and throughout asia. (FreeBSD used
 to have a bit of an edge in Japan; I think it's "marketshare"
 has declined vs. Linux' over the last two years, though both
 are growing much faster than any other OS in that market).

 lkml is not here to cater to 'international developers.' It
 exists to serve Linux kernel developers. Since Linux has
 standardized on English/C it implies a requirement that
 "international developers" (such as Linus, Alan Cox,
 Alexey Kuznetsov etc) be tolerant of English.

 Perusing the CREDITS files should convince anyone that
 Linux has not "alienated" international developers. Linux
 probably has a broader base of support, from a larger range
 of countries, than any other software project in history.
 
> afaik linus speaks swedish as his first language.
 
>> (which requires different characters than ASCII* supports in any
>> case) I shouldn't think this being controversial. Hopefully, the
>> mono-linguists amongst us will pipe down and let Linux be truly
>> universal.

 I was happy to see that GTK+ has recently released an upgrade
 with BiDi (bi-directional I18N) support and that Linux now
 support Arabic. Presumably Hebrew and other BiDi languages
 will be added in the near future. This support should eventually
 permeate the GNOME apps.

 However, that is all in user space. As far as I know there
 has been no need to modify the kernel to support its
 I18N (other than support for Unicode in filenames?)

 What other kernel related I18N problems are we facing on
 the path to "world domination?" I have no idea how one
 would encode BiDi into filenames (for example).

 On the other hand, has anyone here read Bruce Schneier's
 (author of _Applied_Cryptography_, founder of Counterpane systems)
 recent comments on the security implications of Unicode's
 complexity.

 Essentially he says that Unicode is sufficiently complex that
 software supporting it cannot be auditing for security robustness
 to any reasonable degree of confidence.

    Conspiracy nuts might find correlation between the co-incidence
    of that comment and this topic here. A Microsoft-product-using
    correspondent advocating the adoption of I18N support on a core
    Linux mailing list and in the Linux kernel in general ;).

--
Jim Dennis         Technical Research Analyst            Linuxcare, Inc.
             jimd@linuxcare.com, http://www.linuxcare.com/
             415 505-9306                415 701-7457 fax
                 Linuxcare: Support for the Revolution
 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:16 EST