Re: [Patch] Support UTF-8 scripts

From: H. Peter Anvin
Date: Wed Aug 31 2005 - 18:28:09 EST


Followup to: <42FDE286.40707@xxxxxxxxxxx>
By author: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin@xxxxxxxxxxx>
In newsgroup: linux.dev.kernel
>
> This patch adds support for UTF-8 signatures (aka BOM, byte order
> mark) to binfmt_script. Files that start with EF BF FF # ! are now
> recognized as scripts (in addition to files starting with # !).
>
> With such support, creating scripts that reliably carry non-ASCII
> characters is simplified. Editors and the script interpreter can
> easily agree on what the encoding of the script is, and the
> interpreter can then render strings appropriately. Currently,
> Python supports source files that start with the UTF-8 signature;
> the approach would naturally extend to Perl to enhance/replace
> the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature
> to reliably identify UTF-8 source code (instead of assuming
> [encoding system] for source code).
>

BOM should not be used in UTF-8. In fact, it shouldn't be used at
all.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/