Re: UTF-8 and case-insensitivity

From: Daniel Newby
Date: Wed Feb 18 2004 - 22:00:52 EST


Linus Torvalds wrote:
So some variation of the interface

int magic_open(
/* Input arguments */
const char *pathname,
unsigned long flags,
mode_t mode,

What about making the pathname hold the alternative cases for each character, not just an exact string? If Samba wanted to open
"A File.txt", it would do

magic_open( "[a|A][ ][f|F][i|I][e|E][.][t|T][x|X][t|T]", ... )

The syntax shown is conceptual; the actual code would use binary packing. Characters would be variable length to support UTF-8 and the like.

Userland would be responsible for making a useful pathname. If it tried something like "[aL|P|#][m|m]", the kernel would cheerfully use it. The only sanity checking would be that special characters like "/" and ":" cannot have alternatives.

Pros:

1. Filesystem names are looked up in kernel mode, where it might be efficient. (Less grossly slow at least.)

2. But the kernel doesn't care about encodings and character sets.

3. No new kernel infrastructure needed. (I hope?) The case- insensitive system calls don't take a performance hit.

4. The kernel can detect name collisions and decide what to do based on a flag.

5. Lookup tables are totally in userland and outside locks. Each app can use the table it finds appropriate.

6. A naughty app can't deadlock the filesystem.

7. Case-insensitive calls can be atomic, if you're willing to pay the performance price. It's straightforward for magic_creat() to refuse to create collisions.

Cons:

1. Looking up multiple alternatives is hairy. (Not that the other approaches are much prettier.)

2. Massive filenames would get turned into something *really* massive (five times as many bytes for a simple packing). Does this break anything?

-- Daniel Newby


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/