Re: [PATCH] string: Improve the generic strlcpy() implementation

From: Linus Torvalds
Date: Wed Oct 07 2015 - 05:23:00 EST


On Wed, Oct 7, 2015 at 10:04 AM, Rasmus Villemoes
<linux@xxxxxxxxxxxxxxxxxx> wrote:
>
> Well, if truncation has happened the return value is different
> (larger). But assuming the output buffer is large enough, the 'compute
> strlen, then do copying, potentially copying a nul byte which wasn't
> there moments before' is pretty obvious:
[snip]

So I really refuse to worry about the snprintf() family of functions
wrt this race. I don't think it was hugely important for strlcpy()
either - more of a "quality of implementation" issue rather than
anything fundamental - but for snprintf and friends it's an almost
unavoidable issue because of how snprintf works.

Saying that 'strlcpy()' and 'snprintf("%s")' are equivalent is true
only in the loosest sense. Yes, they return the same return value.
Yes, the result string should be the same. But the two are completely
different despite that.

snprintf() has to handle all the *other* cases than just "%s",
including right-justification, string precision handling, etc etc. It
is effectively impossible to do without doing "strlen()" on the source
of the string beforehand. As a result, snprintf() is fundamentally
always going to be racy wrt the string changing during the call.

So the simple end result is that we shouldn't worry about it, and if
you are doing snprintf() on a changing string, you should just be
aware of it. We *do* actually do that, for things like "current->comm"
that really can change while being printed out. We just don't care
deeply, and have in fact been removing locks in this area, because the
end result is still guaranteed to be NUL-terminated etc.

Can we get odd truncated printouts in the (very very very unlikely)
case that the string is being changed? Yes. We just don't care.

With strlcpy(), the situation is different at least in the sense that
we *can* write a source-modification-safe version. Of course, the end
result will still be undefined, but at least the resulting string
length in the destination can be made to not disagree violently with
the return value.

Do we care? Probably not. If you do strlcpy() on strings that change
without using locking, it's either a serialization bug, or you really
don't care very deeply about the end result anyway (ie it's something
like the "current->comm" issue). But just from a quality of
implementation standpoint, I think it would be good to just do the
RightThing(tm) anyway.

That's particularly true since we should be able to do it trivially by
just implementing strlcpy() using strscpy() plus the overflow fixup.
But let's wait with that until people are happy about the state of
strscpy. There's absolutely no rush, and in fact the one thing I
absolutely wanted to avoid was to have the introduction of strscpy()
resulting in pointless churn elsewhere, so let's do the *opposite* of
rushing into this, and just say :"ok, some day we should do this, just
in case"

> Here's a few pseudo-datapoints. About half the strlcpy instances (just
> from lazy grepping) has a string literal as src, and those only have
> about 1/8th chance of being aligned.

Hmm. I think gcc actually tends to align string literals - at least on
architectures where unaligned accesses tend to be more expensive and
we do the whole SLOW_UNALIGNEd handling etc.

I don't think gcc does it on x86, but on x86 we don't much care.
Somebody on ppc or ARM might want to check.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/