Re: UTF-8 and case-insensitivity
From: Daniel Newby
Date: Wed Feb 18 2004 - 22:00:52 EST
Linus Torvalds wrote:
So some variation of the interface
int magic_open(
/* Input arguments */
const char *pathname,
unsigned long flags,
mode_t mode,
What about making the pathname hold the alternative cases for each
character, not just an exact string? If Samba wanted to open
"A File.txt", it would do
magic_open( "[a|A][ ][f|F][i|I][e|E][.][t|T][x|X][t|T]", ... )
The syntax shown is conceptual; the actual code would use binary
packing. Characters would be variable length to support UTF-8 and
the like.
Userland would be responsible for making a useful pathname. If it
tried something like "[aL|P|#][m|m]", the kernel would cheerfully
use it. The only sanity checking would be that special characters
like "/" and ":" cannot have alternatives.
Pros:
1. Filesystem names are looked up in kernel mode, where it might be
efficient. (Less grossly slow at least.)
2. But the kernel doesn't care about encodings and character sets.
3. No new kernel infrastructure needed. (I hope?) The case-
insensitive system calls don't take a performance hit.
4. The kernel can detect name collisions and decide what to do
based on a flag.
5. Lookup tables are totally in userland and outside locks. Each
app can use the table it finds appropriate.
6. A naughty app can't deadlock the filesystem.
7. Case-insensitive calls can be atomic, if you're willing to pay
the performance price. It's straightforward for magic_creat() to
refuse to create collisions.
Cons:
1. Looking up multiple alternatives is hairy. (Not that the other
approaches are much prettier.)
2. Massive filenames would get turned into something *really*
massive (five times as many bytes for a simple packing). Does this
break anything?
-- Daniel Newby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/