strlcpy() notes (was Re: [GIT PULL] smb3 client fixes)

From: Linus Torvalds
Date: Sun Aug 21 2022 - 14:14:00 EST


[ Adding Catalin Marinas because of that final note at the end of this
email, and because of the potential sub-page faults on arm64 ]

On Sat, Aug 20, 2022 at 3:34 PM Steve French <smfrench@xxxxxxxxx> wrote:
>
> - trivial strlcpy removal

Note that while I encourage this, in that thread I forgot to mention a
very subtle difference between strlcpy and strscpy.

Now strlcpy() is a completely broken interface, since it fundamentally
cannot work on untrusted source strings (which is the alleged main
reason to use it!). It's more than "the return value is broken", it's
literally "since strscpy is designed to return the length of the
source string, it can walk arbitrarily far past the end of it, if it
doesn't have a terminating NUL character".

BOOM! SIGSEGV in user space, random kernel oopses in the kernel.

So strlcpy() is completely broken-by-design, and should never be used
for anything but trusted strings, which imnsho makes it largely
pointless. It is an inefficient way to copy trusted strings into
limited destination buffers. That's all it is good for, and that's
simply not a good reason to exist.

In the kernel implementation, it is also racy wrt the data it copies,
and the return value may not even match the actual returned data in
case the source buffer changes under you.

That second thing is fixable, but it's just not worth fixing since the
first issue is fundamental.

So the source buffer has to be not only properly NUL-terminated, it
also has to be stable. Of the two, the first one is a bigger issue,
the second is not worth even worrying about at that point.

End result: strlcpy() does need to go.

But it's worth mentioning that 'strscpy()' not only fixes that "source
overrun" issue and the data race issue (the returned length is
guaranteed to actually match the returned buffer), it also has another
subtle semantic change to fix a performance issue.

strlcpy() is a broken interface not just because of the broken
interface - even if the actual design problems don't matter for you
from a correctness standpoint because your string is properly
terminated AND stable, it can also *perform* badly.

Why? Again, strlcpy() will walk the whole source string, even if you
just asked it to copy the first few bytes of it. For that same
original "this interface is completely broken" reason.

"strscpy()" doesn't do that - the maximum length also limits how far
strscpy will walk in the source string. Which is very much what you
want when you may not trust the source - you just have a buffer size,
you don't know the size of the string (and if you did know it, you'd
never use either strscpy or strlcpy, of course).

BUT.

And this is where the subtle semantic difference comes in.

"strscpy()" goes one step further, since it could start from a clean
slate with no legacy semantics. The function is defined to do copies
in "chunks" (typically one whole word at a time).

In particular, the *destination* is also written not one byte at a
time, but one "chunk" at a time. It will not overrun the destination
buffer (as defined by the size argument), but it basically does *not*
guarantee that it won't write to bytes after the final terminating NUL
character.

In other words, strscpy() may - and in the default implementation
*will* - potentially write more than one terminating NUL character in
an effort to avoid the whole "byte at a time" loop at the end.

Now, no sane user will ever care - the destination buffer is by
definition overwritten, after all. But this does mean that if you do
things like this:

unsigned char dest[5];

strscpy(dest, "hi", 8);

then strscpy() may actually overrun your five-byte destination buffer,
even though your source string is only three bytes in size.

You basically told it that it can access up to 8 bytes (from both
source and destination), and that is what it does (if the chunk size
is 8 - as it would be on x86-64).

So the above will basically do a single word-size load, find a zero
byte in it, clear the subsequent bytes, and do a single word-sized
store.

Efficient, yes. Also very different from what the other 'str*cpy()"
functions do.

So when you tell strscpy() that you have a maximum of N bytes, then
strscpy() really does assume that

(a) it can always read up to N bytes from the source

(b) it can always write up to N bytes (zero-padded) to the destination

Both of those assumption are quite reasonable and sane on their own,
but they are *different*.

In contrast, "strncpy()" basically has the rule that

(a) it can read up to the terminating NUL but at *most* N bytes from the source

(b) it will *always* write exactly N bytes to the destination

and "strlcpy()" has the rules

(a) it can read up to the terminating NULL with *no* regard to N from
the source

(b) it can write up to N bytes to the destination, but stop at the NUL

See? Very different rules, although the differences don't matter most
of the time (except for strlcpy() rule (a), which has that "no regard
for N at all" issue).

Why do I mention this? It doesn't matter for 99% of all uses, but
there are some gotchas:

(1) sometimes "source buffer" and "destination buffer" sizes are
simply different.

The strscpy() assumption is simply that you gave the smaller of the
two buffers, but that's *different* from the traditional meaning,
which is that N is basically the destination buffer size.

So I despise strlcpy(), but I think strscpy() is kind of broken too.
For the generic case, it really should have two separate buffer sizes.

(2) if you expect the destination buffer contents to be untouched
past the terminating NUL character, you're simply out of luck

The strscpy() assumption is that it can arbitrarily write to the
destination buffer.

So the best way to think of "strscpy()" is really as a "optimized
memcpy for strings". That's almost exactly how it acts. It will do a
memcpy(), but stop when it notices that it has copied a NUL character.

(It's a *bit* better than that, in that it will not copy random data
from beyond the string for security purposes - when it stops copying,
it *will* zeropad any excess, it won't actually copy possibly
sensitive data from past the source string, but if you think of it as
a string-optimized memcpy, you really have the right mental model).

It's also worth pointing out that the kernel implementation of
'strscpy()' will not do the chunk-sized accesses across an unaligned
page boundary. So it won't actually take a page fault past the
terminating NUL character, but if you pass it an 'N' that is bigger
than the source buffer, and you have sub-page faults in the kernel, we
might need to do some further work in this are. Catalin?

Of course, the easy solution is "don't pass a bigger N than the source
buffer size", but if you come from a 'strlcpy()' model where N only
affects the destination buffer and the NUL in the source is a cap on
the source, this may cause problems.

Linus