Re: [Patch] Support UTF-8 scripts

From: H. Peter Anvin
Date: Thu Sep 15 2005 - 13:26:31 EST


Martin v. Löwis wrote:

Says who? In UTF-8, it is not used to indicate a byte order; instead,
it is used to indicate the fact that the file is UTF-8, like a magic.
That's why I prefer to call it "UTF-8 signature".

The Unicode consortium thinks that the BOM can be used in UTF-8:

http://www.unicode.org/faq/utf_bom.html#29

The UTF-8 signature is very useful, and I would prefer if it would
be used instead of format-specific encoding declarations.


In Unix, it's a hideously bad idea. The reason is that Unix inherently assumes that text streams can be merged, split, and modified. In other words, unless you can guarantee that EVERY program can handle BOM EVERYWHERE, it's broken.

In other words, it's broken.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/