[Patch] Support UTF-8 scripts
From: "Martin v. Löwis"
Date: Sat Aug 13 2005 - 07:08:56 EST
This patch adds support for UTF-8 signatures (aka BOM, byte order
mark) to binfmt_script. Files that start with EF BF FF # ! are now
recognized as scripts (in addition to files starting with # !).
With such support, creating scripts that reliably carry non-ASCII
characters is simplified. Editors and the script interpreter can
easily agree on what the encoding of the script is, and the
interpreter can then render strings appropriately. Currently,
Python supports source files that start with the UTF-8 signature;
the approach would naturally extend to Perl to enhance/replace
the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature
to reliably identify UTF-8 source code (instead of assuming
[encoding system] for source code).
Please find the patch attached below.
Regards,
Martin
Signed-off-by: Martin v. Löwis <martin@xxxxxxxxxxx>
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -1,7 +1,7 @@
/*
* linux/fs/binfmt_script.c
*
- * Copyright (C) 1996 Martin von Löwis
+ * Copyright (C) 1996, 2005 Martin von Löwis
* original #!-checking implemented by tytso.
*/
@@ -23,7 +23,16 @@ static int load_script(struct linux_binp
char interp[BINPRM_BUF_SIZE];
int retval;
- if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
(bprm->sh_bang))
+ /* It is a recursive invocation. */
+ if (bprm->sh_bang)
+ return -ENOEXEC;
+
+ /* It starts neither with #!, nor with #! preceded by
+ the UTF-8 signature. */
+ if (!(((bprm->buf[0] == '#') && (bprm->buf[1] == '!'))
+ || ((bprm->buf[0] == '\xef') && (bprm->buf[1] == '\xbb')
+ && (bprm->buf[2] == '\xbf') && (bprm->buf[3] == '#')
+ && (bprm->buf[4] == '!'))))
return -ENOEXEC;
/*
* This section does the #! interpretation.
@@ -46,7 +55,8 @@ static int load_script(struct linux_binp
else
break;
}
- for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
+ cp = (bprm->buf[0]=='\xef') ? bprm->buf+5 : bprm->buf+2;
+ while ((*cp == ' ') || (*cp == '\t')) cp++;
if (*cp == '\0')
return -ENOEXEC; /* No interpreter name found */
i_name = cp;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/