Re: Japanese (or other language) postings

From: Mike A. Harris (mharris@meteng.on.ca)
Date: Tue Jul 18 2000 - 08:02:00 EST


On Tue, 18 Jul 2000, Andrew van der Stock wrote:

>Date: Tue, 18 Jul 2000 21:05:00 +1000
>From: Andrew van der Stock <ajv@greebo.net>
>To: linux-kernel@vger.rutgers.edu
>Subject: Japanese (or other language) postings
>
>If you can’t read the following, your mailer (MUA) is
>broken. End of story.

I can read that. It looks like random shit though. ;o)

>ワタシワAndrewデス。

Hmm.. Looks like you've got some line noise there. PINE works
fine with it though.

>With the original poster effectively saying that Linux (and I
>mean the kernel, not just a distro) will not be
>internationalized due to one or two spammers is wrong.

Huh? I don't remember seeing such a posting... I don't see why
the kernel itself needs to be internationalized at all
though. Not error messages and internal stuff. Filesystems, and
stuff that hits userland, like NLS, etc.. sure, no problem. That
is needed IMHO. Not boot messages and such though. Otherwise a
lot of error reports would be quoting 50 foreign languages and
Linus would have to learn a lot more languages. One language to
support is enough.

>International helpers (and generally they are volunteers) have
>posted here before, and we should never cut ourselves off from
>70% of the world who do not speak English.

If they speak even broken English, they should post in English,
especially if they want to reach the widest audience. It has
nothing to do with language superiority, it has to do with
communication at the highest common denominator. Many people
speak one language ONLY. Should they be forced to read postings
in 50 other languages as well? Create other lists for that like
"linux-kernel-jp", "linux-kernel-de", etc.. just to name a
few. If every mailing list allowed posting in all languages
you'd end up with chaos.

>ASCII's time is coming to a close, and confusing two issues

Says who? ASCII is the standard code computers use to
communicate with on most levels where text is involved. On the
majority of systems anyways. This isn't going to change any time
soon IMHO, although there are certainly other codes in use.

>(spam vs ignoring all .jp/double byte mails) is simply wrong

I ignore what I cannot read. I am not going to install an SGML
parser, tex parser, HTML parser, UNICODE parser, Kanji parser ->
English translator, and 50 other format converters to all of my
email software just for the benefit of reading someone elses
postings. I read english, and I join English mailing lists,
etc.. If someone can't follow the rules, I'm not obligated to
read their mail or respond, and I'm free to filter it so it isn't
ever seen. SPAM especially.

>(and skirting damn close to racism).

Not IMHO. Anyone of any race, color, creed, country is free to
post messages here, or to me personally, anytime. I've got
nothing at all against anyone of any race/color, etc.. I do
however have a problem with people sending messages in foreign
charsets that is unreadable by 99.99999999% of all
posters. ESPECIALLY if it is SPAM. I don't care what race they
are. If an English person posts shit in SGML, I'd be just as
ticked off. SAME THING.

>I'm not going into which MUA is best as I've decided what works
>for me and you'll be a partisan of another choice, and there
>should always be that choice.

Well that I agree with. EVERYONE is entitled to their own
software choice, regardless of what anyone else may think.

>The real issue is that many MUA's are simply and unspeakably
>difficult wrt HTML (which is how double byte people must
>communicate with any chance of it being read somewhere else)

Certainly. Email is ASCII text. HTML is at best a file
attachment in another format. An HTML email, is really only an
email with no body of text, and a HTML file attachment. An email
reader that doesn't grok HTML, is one that only follows email
standards. One that parses the file attachments and knows about
HTML is a bonus plan for the person receiving HTML. Expecting
everyone to use a reader that can grok every possible file
attachment type is just rediculous when ALL MAIL READER SOFTWARE
CAN READ ASCII TEXT.

>and how they handle languages in general. Spam is a different
>issue, and one that must be addressed but separately to
>languages.

Agreed.

>This is an English language forum, and substationally that
>should not change,

Hmm... I got from your above comments that you thought any
language should be ok.... guess I read you wrong... Either that
or you're contradicting yourself.. Sorry if I've misunderstood
you in any way.

>but cutting off particular language speakers just to avoid spam
>or faulty or substandard mailers smacks of avoiding the real
>issue.

Cutting off a TLD, yes. Cutting off a content type encoding,
no. I can't read Japanese, Hebrew, Hindi, or any other
language. Filtering messages that say they are in those
languages, is thus useful to me. What use is it if my mailer CAN
display Kanji, if I can't freaking read Japanese????????? So WHY
SHOULD I KEEP IT? Delete it and be called racist? Rediculous.

>What happens if a Hebrew, Hindi or Russian language spammer (or
>more likely, the English-language spammers that I've nearly
>universally had in the 11 years I've had e-mail) mails here on
>a regular basis? The problem of spam is separate to I18N, and
>should always be considered so.

Certainly. English SPAM doesn't include random characters that
kill your mailer and erase your hard disk though. Kanji does.

Maybe I can come up with my own English encoding format. One
that does not use ASCII at all. Perhaps I'll come up with my own
syllabic encoding using lots of ^$%#*@(# characters. I'll make
it so that it crashes Netscape and Internet Exploder. Woo hoo.

Then I'll post english messages encoded in #$(*%@#lish encoding
to foreign mailing lists that are in other languages - without
finding out what language the list is in first. Hope nobody's
mailer crashes.

How about we just all post our email in Adobe PDF format, or
Postscript? That is a widely used standard, no?

>Otherwise, lkml will alienate international developers.

Judging by the current spread of developers, I don't see any hint
of that ever occuring. I can't pronounce half the peoples names
on this list, and most of those are foreign. I'm damned glad
that they are on this list though, and coding for Linux. I'm
also happy they took the time to learn English.

Funny thing: People speaking non-english are often hesitant to
post messages in English, fearing that their English is too bad
and they will get ridiculed or feel embarassed. I've yet to see
someone who's English was not understandable though.

My experience with conversing with people of whom just learned
English recently is that they speak/write quite well in less than
a year of starting.

Anyone posting to a mailing list should first of all learn what
language the list is in, or if it is for open postings in any
language. That is called netiquette. IMHO.

>As Linus speaks Finnish as his first language (which requires
>different characters than ASCII* supports in any case) I
>shouldn't think this being controversial. Hopefully, the
>mono-linguists amongst us will pipe down and let Linux be truly
>universal. Hyvää päivää,

Linus is decent enough to post in English though. Most likely
because he knows that will reach the widest audience, and is the
language of "business" so to speak. If he posted in Finnish
though, it would fit under the "I can do what I want because I
started your damned operating system, and run the show here so
there!" rule. ;o)

>Andrew van der Stock, ajv@greebo.net http://www.greebo.net
>SAGE-AU President http://www.sage-au.org.au
>
>* ps remember that ASCII is 7 bits (remember VMS?). Characters
>above 129 are defined by whoever defines the platform. Try
        ^
       127

>getting a Mac and a PC to agree where ë or ö appear in this
>space without specifying a particular NLS or using Unicode.

Yep. If someone posts in English, even with a different charset,
their posting is normally quite readable. Only odd characters,
usually in names for example - show up funny. Only rarely does
foreign language postings not show up right for me. I'm using a
pretty stock English linux configuration too.

No problem. A oe character here, ae there, it is
understandable. If it is #$@#$&*^@#%(@#&%(@&&@#%@#()$*@#)$
0#($*)@#*)%*@#*%@#$*%_@#$(* though, it is NOT understandable.

The particular message that crashed my PINE, was in fact NOT
following standards. There were illegal characters in the header
lines. The only thing PINE did wrong was not filter illegal
characters out. I wonder if the PINE devel team is aware of
this... I'm sure someone will come up with a nasty exploit for
it though...

Well, back to the grind...

TTYL

-- 
Mike A. Harris                                     Linux advocate     
Computer Consultant                                  GNU advocate  
Capslock Consulting                          Open Source advocate

... Our continuing mission: To seek out knowledge of C, to explore strange UNIX commands, and to boldly code where no one has man page 4.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:11 EST