Re: OT: character encodings (was: Linux 2.6.20-rc4)

From: David Woodhouse
Date: Sun Jan 07 2007 - 10:14:02 EST


On Sun, 2007-01-07 at 14:06 +0100, Tilman Schmidt wrote:
> Russell King schrieb:
> > Welcome to the mess which the UTF-8 charset creates.

Utter bollocks.

> The problem of different character encodings coexisting on the same
> platform, and the resulting occasional messing-up, far predates Unicode.
> I distinctly remember one case of being bitten by this myself in 1977
> when Unicode wasn't even on the horizon yet, and I don't think that was
> the first time.

Indeed. If you take arbitrary content and send it out to the world
labelled as ISO8859-1, of _course_ you're likely to be corrupting it.

Far from being the cause of the problem, UTF-8 actually offers the
chance of a _solution_. Because once the Luddites catch up, it'll
largely eliminate the need for using the multitude of legacy character
sets and converting between them -- and the problem of mislabelling will
fairly much go away.

--
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/