[RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.)
From: Kyle Moffett
Date: Wed Jun 15 2005 - 23:01:20 EST
On Jun 15, 2005, at 21:55:04, Patrick McFarland wrote:
On Wednesday 15 June 2005 05:13 am, Denis Vlasenko wrote:
I do not understand how this is going to look from userspace
perspective.
Can you give examples how this will work?
IMHO, he means that the userspace would only see Unicode filenames,
and the
userspace could only give Unicode names back to the kernel. The
kernel, using
this global NLS layer would translate back and forth, and the userland
wouldn't know about it.
Its basically the only sane way to approach the problem of getting
the entire
Linux community to convert to Unicode.
Would the following system for filenames resolve most of the issues
people
are raising:
First load charset tables into the kernel. These would be stored in
files in
userspace and could be easily updated, renamed, deleted, etc. Such a
table
would always be a translation from Unicode <=> Charset. A kernel
with this
system built in would understand natively "raw", "utf8", "utf16", and
"utf32",
anything else would need loaded charset tables.
The following mount options would available:
nls_raw=(0|1) [default 1]:
This would cause Linux to pass all chars through unmolested.
This mode
works well on multiuser systems where users want to use their
own NLS
tools, or where the whole system uses UTF-8, including the
filesystems.
This is backwards compatible with the way Linux currently
presents most
(all?) filesystems. If the options "nls_disk" or "nls_user" are
used,
then this option is forced to be zero.
nls_disk=<string-charset>
This specifies the underlying charset which should be used on
the disk
or filesystem itself. This may be "negotiate" for any filesystems
which support NLS *and* can identify which charset is in use.
Built in
options are "utf8", "utf16", and "utf32". Defaults to
"negotiate" if
available otherwise "utf8", but only defaults if "nls_raw" is 0.
nls_user=<string-charset>
This specifies the charset which should be presented to the
user. This
may be used to allow a backwards compatibility (IE: A program wants
ISO8859-1, but the admin wants the underlying filesystem to use
UTF-8.
Built in options are "utf8", "utf16", and "utf32". Defaults to
"utf8"
if "nls_raw" is 0.
The end result is that specifying either nls_disk or nls_user will
turn on
automatic NLS conversion, with the unspecified nls_ option being utf8.
If these options are used on bind mounts, they should override the
underlying
filesystem's mount options (Instead of stacking). This will allow
the admin
to specify:
# mount -t ext3 -o nls_disk=utf8,nls_user=utf8 /dev/hdb /mnt
# mount --bind -o nls_disk=utf8,nls_user=iso8850-1 /mnt/mail /var/
spool/mail
if he/she wants to provide backwards compatibility with a legacy mail
spooling program. Note: A part of each translation table would be an
entry for "Unspecified character", such that any UTF-8 character not
mapped
in the table could be translated to a sane default, such as '?'. If
names
collide under such translation, the kernel would need a way to keep
track of
the collisions (Appended numbers?) and properly re-resolve them when
asked.
Cheers,
Kyle Moffett
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$
r !y?(-)
------END GEEK CODE BLOCK------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/