Re: [RFC] strcpys(): New function for copying strings safely

From: Alejandro Colomar (man-pages)
Date: Mon Jun 28 2021 - 07:32:33 EST

Hello David,

On 6/28/21 10:15 AM, David Laight wrote:
From: Alejandro Colomar
Sent: 27 June 2021 20:47

On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:

It is designed so that usage requires the minimum number of lines of
code for complete usage (including error handling checks):

// When we already checked that 'size' is >= 1
// and truncation is not an issue:

strcpys_np(size, dest, src, NULL);

Also, given how unlikely this case is, I have in my code:

warn_unused_result is such a PITA it should never have been invented.
At least no one seems to have been stupid enough to apply it to fprintf()
(you need to do fflush() before looking for errors).

Okay. After some thought, I think removing it is better.

In most cases strcpy() is safe - but if the inputs are 'corrupt'
horrid things happen - so you need truncation for safety.
But you really may not care whether the data got truncated.

Yes, in most cases, expecially with literal strings ("xxx") and known buffers, strcpy() is completely safe.

And yes, if the input is untrusted (may be corrupt or not), you need to be prepared for truncation. You don't need to truncate, hey, maybe the input is well-formed, but you need a function that may truncate if needed.

For that, we have strncpy(3), strlcpy(3bsd), and strscpy (kernel).

strncpy(3) is not for strings; it is for fixed length buffers. It has high performance penalties, and yet doesn't guarantee a C-string. DISCARDED.

strlcpy(3bsd) still requires that the input is a C-string, so it can't be used with untrusted strings. It really is broken, as Linus said. Its only valid use is if the input string is known (maybe a literal), but the output buffer may be smaller than the input string. DISCARDED.

strscpy (kernel) is great. BUT it is not supported by glibc. The reason for not including it (actually Ulrich used it for not including strlcpy, but it also applies to strscpy) is that it doesn't report an error when the string is truncated. I think that strscpy adds value to the current strcpy variants, and should be added to glibc.

And then, coming back to truncation, yes, in some cases, truncation is wanted, and not an error, for which strscpy is great (so please add it to glibc :), but then there are other cases where truncation is not wanted, but yet we deal with untrusted strings. That is the case that I think Ulrich mentioned that wasn't covered by strlcpy (and isn't yet covered by strscpy). And the reason to add this function.

Let's say a user introduces a username; it's an untrusted string, we need to process it at least with strscpy(). But if it is corrupted (too long), we don't want to silently store a truncated username; we want to inform the user, so that he introduces a new (hopefully valid) username. For that, strcpys() as defined in the previous email, is necessary.

The other use is where you want a sequence of:
len *= str_copy(dest + len, dest_len - len, src);
But don't really want to test for overflow after each call.

This is a legitimate use of strscpy(). Please, add it to glibc, as there is no libc function to do that.

In this case returning the buffer length (including the added
'\0' on truncation works quite nicely.
No chance of writing outside the buffer, safe truncation and
a simple 'len == dest_len' check for overflow.

OTOH there are already too many string copy functions.

There are, but because the standard ones don't serve all purposes correctly, so people need to develop their own. If libc provided the necessary functions, less custom string copy functions would be necessary, as Christoph said a long time ago, which is a good thing.

As I see it, there are the following, each of which has its valid usage:

known input && known buffer
not for C strings; only for fixed width buffers of characters!
known input && unknown buffer
Given that performance-wise it's similar to strscpy(),
it should probably always be replaced by strscpy().
unknown input && truncation is silently ignored
Except for performance-critical applications,
this call may replace strcpy(3), to add some extra safety.
unknown input && truncation may be an error (or not).
This call can replace strscpy() in most cases,
simplifying usage.
The only case I can see that strscpy() is superior
is for chains of strscpy(), where I'd only use strcpys()
in the last one (if any strscpy() in the chain has been
trunncated, so will the last strcpys() (unless it's "")).

For the reasons above, I propose both strscpy() and strcpys() for inclusion in glibc. Both trivial sources are in the previous email.




BTW, I did some research in the kernel and glibc sources to see how are str*cpy functions being used there:

user@sqli:~/src/gnu/glibc$ grep -rn 'strcpy(' * | sort -R | head -n 7
string/tester.c:1126: (void) strcpy(one, "abcd");
resolv/nss_dns/dns-host.c:532: strcpy(qp, "");
dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX");
string/tester.c:1006: cp = strcpy(one, "abc");
timezone/zic.c:459: return result ? strcpy(result, str) : result;
timezone/zic.c:2886: strcpy(startbuf, zp->z_format);
resolv/compat-gethnamaddr.c:393: strcpy(bp, qname);

- string/tester.c:1126: (void) strcpy(one, "abcd");

- resolv/nss_dns/dns-host.c:532: strcpy(qp, "");

- dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX");

- string/tester.c:1006: cp = strcpy(one, "abc");

- timezone/zic.c:459: return result ? strcpy(result, str) : result;

- timezone/zic.c:2886: strcpy(startbuf, zp->z_format);
I couldn't find a guarantee that zp->z_format is NUL-terminated.
Possible buffer overrun.
I couldn't find a guarantee that zp->z_format fits in startbuf.
Possible buffer overflow.
This call could be replaced, at least, by strlcpy. The calls to strchr()
previous to that strcpy() may also benefit from using strnchr().

- resolv/compat-gethnamaddr.c:393: strcpy(bp, qname);

user@sqli:~/src/linux/linux$ grep -rn 'strcpy(' * | sort -R | head -n 3
drivers/platform/x86/wmi.c:346: strcpy(method, "WQ");
sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)");
sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH");

user@sqli:~/src/linux/linux$ grep -rn 'strlcpy(' * | sort -R | head -n 3
tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename, sizeof(from_dir));
drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version, UETH__VERSION, sizeof(p->version));
drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username, chap_table->name, QL4_CHAP_MAX_NAME_LEN);

user@sqli:~/src/linux/linux$ grep -rn 'strscpy(' * | sort -R | head -n 3
sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id));
sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio", sizeof(pcm->name));
crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher", sizeof(rcipher.type));

- drivers/platform/x86/wmi.c:346: strcpy(method, "WQ");

- sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)");

- sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH");
The sprintf() 2 lines above is also okay.

- tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename, sizeof(from_dir));
Okay. BUT no libc support!

- drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version, UETH__VERSION, sizeof(p->version));
Okay. BUT no libc support!
However, in the next line, it's not obvious where dev->gadget->name comes from.
eth_get_drvinfo() is a callback (it's stored in a pointer), so I can't follow it.
If it were an untrusted string (not NUL-terminated), it would be a bug.

- drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username, chap_table->name, QL4_CHAP_MAX_NAME_LEN);
Okay. BUT no libc support!

- sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id));
Okay. BUT no libc support!

- sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio", sizeof(pcm->name));
Okay. BUT no libc support!

- crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
Okay. BUT no libc support!

None of the above checked for truncation, so I assume it's not an issue there.

Alejandro Colomar
Linux man-pages comaintainer;