Re: [PATCH v3] libbpf: fix UAF in strset__add_str()
From: Andrii Nakryiko
Date: Mon May 18 2026 - 13:39:43 EST
On Sun, May 17, 2026 at 10:05 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
>
> strset_add_str_mem() might reallocate the strset data buffer in order to
> accommodate the provided string 's'. However, if 's' points to a string
> already present in the buffer, it becomes dangling after the realloc.
> This leads to a use-after-free when attempting to memcpy() the string
> into the new buffer.
>
> One scenario that triggers this problematic path is when resolve_btfids
> attempts to patch kfunc prototypes using existing BTF parameter names:
>
> | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> | Segmentation fault (core dumped)
>
> Compiling resolve_btfids with fsanitize=address generates a detailed
> report of the UAF:
>
> | =================================================================
> | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> | READ of size 5 at 0x7f4c4a500bd4 thread T0
> | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> |
> | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> | freed by thread T0 here:
> | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> |
> | previously allocated by thread T0 here:
> | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
>
> While resolve_btfids could be refactored to avoid this call path, let's
> instead fix this issue at the source in strset__add_str() and avoid
> similar scenarios.
>
> Let's check if set->strs_data was reallocated and whether 's' points to
> an internal string within the old strset buffer. In such case, 's' is
> reconstructed to point to the new buffer.
>
> While already here, also fix strset__find_str() which suffers from the
> same problem by factoring out the common operations into a new helper
> function strset_str_append().
>
> Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> Suggested-by: Mykyta Yatsenko <yatsenko@xxxxxxxx>
> Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx>
> ---
> v3:
> Switch to 's' reconstruction approach suggested by Andrii.
> Adjusted names and commit log accordingly.
>
> v2:
> Implemented the fix in strset__offset() helper as suggested by Mykyta.
> Added support to handle "substrings" of existing ones.
> Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
> https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@xxxxxxxxxx/
>
> v1:
> https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@xxxxxxxxxx/
> tools/lib/bpf/strset.c | 58 +++++++++++++++++++++++++++---------------
> 1 file changed, 37 insertions(+), 21 deletions(-)
>
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..d229961ff2fc 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,6 +107,37 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> +static long strset_str_append(struct strset *set, const char *s)
> +{
> + const char *old_data = strset__data(set);
sashiko suggests to use uintptr_t for this detection to avoid
undefined behavior, let's do that then)?
but other than that and a nit below, looks good, thanks!
pw-bot: cr
> + long len = strlen(s) + 1;
> + void *p;
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> + p = strset_add_str_mem(set, len);
> + if (!p)
> + return -ENOMEM;
> +
> + /* The set->strs_data might have reallocated and if 's' pointed
> + * to an internal string within the old buffer, then it became
> + * dangling and needs to be reconstructed before the copy.
> + */
> + if (old_data && old_data != strset__data(set) &&
> + s >= old_data && s < old_data + set->strs_data_len)
> + s = strset__data(set) + (s - old_data);
nit: strset__data() is more of an "external api", note how we access
strs_data_len directly through field. So let's use s->strs_data access
consistently.
> +
> + memcpy(p, s, len);
> +
> + return len;
> +}
> +
> /* Find string offset that corresponds to a given string *s*.
> * Returns:
> * - >0 offset into string data, if string is found;
> @@ -116,16 +147,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> int strset__find_str(struct strset *set, const char *s)
> {
> long old_off, new_off, len;
> - void *p;
>
> - /* see strset__add_str() for why we do this */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> + len = strset_str_append(set, s);
> + if (len < 0)
> + return len;
>
> new_off = set->strs_data_len;
> - memcpy(p, s, len);
>
> if (hashmap__find(set->strs_hash, new_off, &old_off))
> return old_off;
> @@ -142,24 +169,13 @@ int strset__find_str(struct strset *set, const char *s)
> int strset__add_str(struct strset *set, const char *s)
> {
> long old_off, new_off, len;
> - void *p;
> int err;
>
> - /* Hashmap keys are always offsets within set->strs_data, so to even
> - * look up some string from the "outside", we need to first append it
> - * at the end, so that it can be addressed with an offset. Luckily,
> - * until set->strs_data_len is incremented, that string is just a piece
> - * of garbage for the rest of the code, so no harm, no foul. On the
> - * other hand, if the string is unique, it's already appended and
> - * ready to be used, only a simple set->strs_data_len increment away.
> - */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> + len = strset_str_append(set, s);
> + if (len < 0)
> + return len;
>
> new_off = set->strs_data_len;
> - memcpy(p, s, len);
>
> /* Now attempt to add the string, but only if the string with the same
> * contents doesn't exist already (HASHMAP_ADD strategy). If such
> --
> 2.54.0.563.g4f69b47b94-goog
>