Re: [PATCH v2] libbpf: fix UAF in strset__add_str()

From: Andrii Nakryiko

Date: Fri May 15 2026 - 18:12:46 EST

On Thu, May 14, 2026 at 9:48 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
>
> strset_add_str_mem() might reallocate the strset data buffer in order to
> accommodate the provided string 's'. However, if 's' points to a string
> already present in the buffer, it becomes dangling after the realloc.
> This leads to a use-after-free when attempting to memcpy() the string
> into the new buffer.
>
> One scenario that triggers this problematic path is when resolve_btfids
> attempts to patch kfunc prototypes using existing BTF parameter names:
>
> | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> | Segmentation fault (core dumped)
>
> Compiling resolve_btfids with fsanitize=address generates a detailed
> report of the UAF:
>
> | =================================================================
> | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> | READ of size 5 at 0x7f4c4a500bd4 thread T0
> | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> |
> | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> | freed by thread T0 here:
> | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> |
> | previously allocated by thread T0 here:
> | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
>
> While resolve_btfids could be refactored to avoid this call path, let's
> instead fix this issue at the source in strset__add_str() and avoid
> similar scenarios. Let's simply check whether 's' is already within the
> strset data buffer boundaries, and skip appending it at the end of the
> buffer again.
>
> While already here, also fix strset__find_str() which suffers from the
> same problem by factoring out the common operations into a new helper
> function strset__offset() as suggested by Mykyta.
>
> Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> Suggested-by: Mykyta Yatsenko <yatsenko@xxxxxxxx>
> Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx>
> ---
> v2:
> Implemented the fix in strset__offset() helper as suggested by Mykyta.
> Added support to handle "substrings" of existing ones.
> Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
>
> v1:
> https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@xxxxxxxxxx/
>
> tools/lib/bpf/strset.c | 70 ++++++++++++++++++++++++++----------------
> 1 file changed, 43 insertions(+), 27 deletions(-)
>
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..e50da238914d 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,25 +107,53 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> -/* Find string offset that corresponds to a given string *s*.
> - * Returns:
> - * - >0 offset into string data, if string is found;
> - * - -ENOENT, if string is not in the string data;
> - * - <0, on any other error.
> +/* Returns the offset of the string in set, and populates *grow* with the
> + * length which set needs to grow by to include *s*.
> */
> -int strset__find_str(struct strset *set, const char *s)
> +static long strset__offset(struct strset *set, const char *s, long *grow)

tbh, it seems like

> {
> - long old_off, new_off, len;
> + const char *strs = strset__data(set);
> + long len;
> void *p;
>
> - /* see strset__add_str() for why we do this */
> + /* Check whether 's' is already within the buffer */
> + if (strs && s >= strs && s < strs + set->strs_data_len) {
> + *grow = 0;
> + return s - strs;
> + }
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> len = strlen(s) + 1;
> p = strset_add_str_mem(set, len);
> if (!p)
> return -ENOMEM;
>
> - new_off = set->strs_data_len;
> memcpy(p, s, len);
> + *grow = len;
> +
> + return set->strs_data_len;
> +}
> +
> +/* Find string offset that corresponds to a given string *s*.
> + * Returns:
> + * - >0 offset into string data, if string is found;
> + * - -ENOENT, if string is not in the string data;
> + * - <0, on any other error.
> + */
> +int strset__find_str(struct strset *set, const char *s)
> +{
> + long old_off, new_off, grow;
> +
> + new_off = strset__offset(set, s, &grow);
> + if (new_off < 0)
> + return new_off;
>

I'm not a fan of bypassing strs_hash, tbh... see below

> if (hashmap__find(set->strs_hash, new_off, &old_off))
> return old_off;
> @@ -141,25 +169,12 @@ int strset__find_str(struct strset *set, const char *s)
> */
> int strset__add_str(struct strset *set, const char *s)
> {
> - long old_off, new_off, len;
> - void *p;
> + long old_off, new_off, grow;
> int err;
>
> - /* Hashmap keys are always offsets within set->strs_data, so to even
> - * look up some string from the "outside", we need to first append it
> - * at the end, so that it can be addressed with an offset. Luckily,
> - * until set->strs_data_len is incremented, that string is just a piece
> - * of garbage for the rest of the code, so no harm, no foul. On the
> - * other hand, if the string is unique, it's already appended and
> - * ready to be used, only a simple set->strs_data_len increment away.
> - */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> -
> - new_off = set->strs_data_len;
> - memcpy(p, s, len);
> + new_off = strset__offset(set, s, &grow);
> + if (new_off < 0)
> + return new_off;

I agree that this is a usability problem, but I'd handle it by
recalculating s pointer to correct one if it happens to be coming from
invalidated strs_data. Something along the lines of:

const char *old_data = set->strs_data;
size_t old_data_len = set->strs_data_len;

p = strset_add_str_mem(...);
if (!p)
return -ENOMEM;

if (p != old_data && s >= old_data && s < old_data + old_data_len)
s = p + (s - old_data);

At this point s will be correct even if it was invalidated.

pw-bot: cr

>
> /* Now attempt to add the string, but only if the string with the same
> * contents doesn't exist already (HASHMAP_ADD strategy). If such
> @@ -172,6 +187,7 @@ int strset__add_str(struct strset *set, const char *s)
> if (err)
> return err;
>
> - set->strs_data_len += len; /* new unique string, adjust data length */
> + set->strs_data_len += grow; /* new unique string, adjust data length */
> +
> return new_off;
> }
> --
> 2.54.0.563.g4f69b47b94-goog
>