Re: UTF-8 filenames
From: Norman Diamond
Date: Sun Feb 22 2004 - 07:33:43 EST
kernel@xxxxxxxxxxxx wrote:
> So then, just about everyone agrees that if you've got a filename with
> non-ASCII characters, you should pass it to creat() as UTF-8. You have
> to pass it as something, individual encodings like BIG5 and EUC-JP
> are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in
> VERY rare cases storage size reductions) aren't worth the stuff it
> breaks. Correct?
Correct except for the following cases. Unix users for more than 20 years
have been creating filenames encoded in EUC-JP or SJIS (yes sadly some Unix
systems used SJIS). I don't know how long BIG5 and Korean filenames have
been supported in Unix but it's probably not much different. Consider
converting all your ASCII filenames to UTF-16. Let everyone share the
short-term pain for the long-term gain. When you get everyone to agree on
UTF-16, it will be ugly, but it will be equal for everyone.
By the way, another subthread mentioned that stty puts some stuff in the
kernel that could be done in user space. In Unix systems the same is true
for IMEs, stty options specify the encoding of the output of an IME (e.g.
EUC-JP or SJIS, which then gets forwarded as input to shells, applications,
etc.), and whether a single backspace (or whatever character deletion
character) deletes an entire input character instead of just deleting a
single byte, etc. I keep forgetting to see if Linux has the same stty
options. I haven't needed to set them with stty because if I need to use a
different locale then I just open a new terminal emulator window using that
locale.
I don't have time even to follow all of this thread, so if anyone has
questions then CC me personally. I don't know if I'll have time to answer
either, but I'll try.
> As I see it, there's no way for the kernel to deal with all the legacy
> filenames out there. There's no way the kernel can magically fix them.
That's true. Some options of mount and some options of stty can be moved to
user space, but they will always need to be available.
By the way in Windows 98 it's really neat to share a disk folder across the
network and let clients with different code pages create files. The host
where the folder is stored can't even delete some of the files that get
created.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/