[PATCH] string: Improve the generic strlcpy() implementation

From: Ingo Molnar
Date: Mon Oct 05 2015 - 04:56:50 EST


The current strlcpy() implementation has two implementational
weaknesses:

1)

There's a race:

size_t strlcpy(char *dest, const char *src, size_t size)
{
size_t ret = strlen(src);

if (size) {
size_t len = (ret >= size) ? size - 1 : ret;
memcpy(dest, src, len);
dest[len] = '\0';
}
return ret;
}

If another CPU or an interrupt changes the source string after the strlen(), but
before the copy is complete, and shortens the source string, then we copy over the
NUL byte of the source buffer - including fragments of earlier source string
tails. The target buffer will still be properly NUL terminated - but it will be a
shorter string than the returned 'ret' source buffer length. (despite there not
being truncation.)

The s390 arch implementation has the same race AFAICS.

This may cause bugs if the return code is subsequently used to assume that it is
equal to the destination string's length. (While in reality it's shorter.)

The race is not automatically lethal, because it's guaranteed that the returned
length is indeed zero-delimited (due to the overlong copy we did) - so if the
string is memcpy()-ed, then it will still result in a weirdly padded but valid
string.

But if any subsequent use of the return code relies on the return code being equal
to a subsequent call of strlen(dest), then that use might lead to bugs. I.e. our
implementation of strlcpy() is indeed racy and unrobust.

But we can fix this race: by iterating over the string in a single go and
determining the length and copying the string at once. Like strscpy(), but with
strlcpy() semantics.

The new implementation uses word-by-word iteration over the strings if possible,
so this will also make strlcpy() faster as well.

2)

Another problem is that strlcpy() will also happily do bad stuff if we pass
it a negative size. Instead of that we will from now on print a (one time)
warning and return safely.

Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
lib/string.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 78 insertions(+), 8 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 8dbb7b1eab50..e0cfca299606 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -129,23 +129,93 @@ EXPORT_SYMBOL(strncpy);
* strlcpy - Copy a C-string into a sized buffer
* @dest: Where to copy the string to
* @src: Where to copy the string from
- * @size: size of destination buffer
+ * @dest_size: size of destination buffer
*
* Compatible with *BSD: the result is always a valid
* NUL-terminated string that fits in the buffer (unless,
* of course, the buffer size is zero). It does not pad
* out the result like strncpy() does.
*/
-size_t strlcpy(char *dest, const char *src, size_t size)
+size_t strlcpy(char *dest, const char *src, size_t dest_size)
{
- size_t ret = strlen(src);
+ const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
+ size_t dest_left = dest_size;
+ size_t dest_aligned_left = dest_left;
+ long src_len = 0;
+
+ /* Overflow check: */
+ if (unlikely(dest_size < 0)) {
+ WARN_ONCE(1, "strlcpy(): dest_size < 0 underflow!");
+ return strlen(src);
+ }
+
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+ /*
+ * If src is unaligned, don't cross a page boundary,
+ * since we don't know if the next page is mapped.
+ */
+ if ((long)src & (sizeof(long) - 1)) {
+ size_t limit = PAGE_SIZE - ((long)src & (PAGE_SIZE - 1));
+ if (limit < dest_aligned_left)
+ dest_aligned_left = limit;
+ }
+#else
+ /* If src or dest is unaligned, don't do word-at-a-time. */
+ if (((long) dest | (long) src) & (sizeof(long) - 1))
+ dest_aligned_left = 0;
+#endif
+
+ /* First do the word-at-a-time copy of the aligned portion (if any): */
+ while (dest_aligned_left >= sizeof(unsigned long)) {
+ unsigned long c, data;

- if (size) {
- size_t len = (ret >= size) ? size - 1 : ret;
- memcpy(dest, src, len);
- dest[len] = '\0';
+ c = *(unsigned long *)(src+src_len);
+ *(unsigned long *)(dest+src_len) = c;
+
+ if (has_zero(c, &data, &constants)) {
+ data = prep_zero_mask(c, data, &constants);
+ data = create_zero_mask(data);
+ /* The target string was terminated by the above word copy */
+ return src_len + find_zero(data);
+ }
+ src_len += sizeof(unsigned long);
+ dest_left -= sizeof(unsigned long);
+ dest_aligned_left -= sizeof(unsigned long);
}
- return ret;
+
+ /*
+ * We get here either for tails smaller than word size, or
+ * unaligned strings. Copy byte by byte and return the
+ * length of the source string if we find its end:
+ */
+ while (dest_left) {
+ char c;
+
+ c = src[src_len];
+ dest[src_len] = c;
+ if (!c)
+ /* The target string was terminated by the above byte copy */
+ return src_len;
+ src_len++;
+ dest_left--;
+ }
+
+ /*
+ * We get here if the source string is larger than the destination buffer.
+ *
+ * The strlcpy() semantics require us to return the length of the
+ * source string - so we have to continue until we find its end.
+ *
+ * We first zero-terminate the (truncated, hence non yet terminated)
+ * target string.
+ */
+ if (dest_size)
+ dest[dest_size-1] = '\0';
+
+ while (src[src_len])
+ src_len++;
+
+ return src_len;
}
EXPORT_SYMBOL(strlcpy);
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/