Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: H. Peter Anvin
Date: Tue Feb 17 2004 - 22:09:49 EST


Followup to: <20040217163613.GA23499@xxxxxxxxxxxxxxxxxx>
By author: Jamie Lokier <jamie@xxxxxxxxxxxxx>
In newsgroup: linux.dev.kernel
>
> Linus Torvalds wrote:
> > Which flies in the face of "Be strict in what you generate, be liberal in
> > what you accept". A lot of the functions are _not_ willing to be liberal
> > in what they accept. Which sometimes just makes the problem worse, for no
> > good reason.
>
> Unicode specifies that a program claiming to read UTF-8 _must_ reject
> malformed UTF-8.
>
> Ok, we can just ignore Unicode. :)
>
> But the reason they cite is security: when applications allow
> malformed UTF-8 through, there's plenty of scope for security holes
> due to multiple encodings of "/" and "." and "\0".
>
> This is a real problem: plenty of those Windows worms that attack web
> servers get in by using multiple-escaped funny characters and
> malformed UTF-8 to get past security checks for ".." and such.
>

Actually, the kernel is 100% compliant in that respect.

The only byte sequences the kernel interpret:

00
2E
2E 2E
2F

.. and it correctly rejects (in the sense that it doesn't alias) any
other possible byte stream that could be interpreted as the same
sequences by a naïvely incorrect UTF-8 encoder.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/