Re: UTF-8 and case-insensitivity
From: H. Peter Anvin
Date: Wed Feb 18 2004 - 00:36:09 EST
Followup to: <16434.56190.639555.554525@xxxxxxxxx>
By author: tridge@xxxxxxxxx
In newsgroup: linux.dev.kernel
>
> In Samba that's not sparse enough that its worth saving the single
> mmap of 128k to encode it sparsely in memory, but in UCS-4 land you
> would obviously use a sparse mapping, and that mapping table would
> probably be just a few k in size. If you allow for extents then I
> expect you could encode it in a couple of hundred bytes.
>
If all you care about is the UTF-16-compatible range, you only need
1088K entries in your table; small enough that it can be reasonably
had in userspace.
> (I experimented with using a sparse mapping in Samba, and it was a
> slight loss on the machine I was testing on compared to just doing the
> mmap, so I went with the mmap. Maybe someone else can do a better
> sparse encoding than I did and actually get a win due to better cache
> behaviour.)
The thing is, you're probably only touching small parts of your table,
so the kernel and the CPU cache works quite well on the large table as
it is.
Wouldn't work in kernel space, though.
-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/