Re: [PATCH 4/4] samples: bpf: add mthp_ext

From: Vernon Yang

Date: Wed May 06 2026 - 09:52:14 EST

Forwarding sashiko review

On Mon, May 4, 2026 at 1:57 AM <sashiko-bot@xxxxxxxxxx> wrote:
>
> Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
> - [High] Infinite loop in `trigger_scan()` when `read()` returns an error
> - [Medium] Integer overflow in `FROM_MB()` macro disables fallback logic for values >= 4096
> - [Medium] Spurious mTHP fallback during initial cgroup scan due to zero-initialized baseline
> - [Medium] Failure in `setup_psi_trigger()` masks the error code and exits successfully
> --
>
> commit d30597e7a4d6ca7d5532535074e0c66c2c92f486
> Author: Vernon Yang <yanglincheng@xxxxxxxxxx>
>
> samples: bpf: add mthp_ext
>
> This commit adds a new BPF sample program, mthp_ext, to adjust mTHP sizes
> based on cgroup memory pressure and usage. It monitors the cgroup directory
> and automatically falls back to 4KB sizes during high memory pressure or
> low usage.
>
> > diff --git a/samples/bpf/mthp_ext.h b/samples/bpf/mthp_ext.h
> > new file mode 100644
> > index 000000000000..33dc01bcebd3
> > --- /dev/null
> > +++ b/samples/bpf/mthp_ext.h
> > @@ -0,0 +1,30 @@
> [ ... ]
> > +#define min(a, b) ((a) < (b) ? a : b)
> > +#define FROM_MB(s) (s * 1024 * 1024)
>
> Does this macro suffer from a 32-bit integer overflow if s is 4096 or
> larger?
>
> If s is an unsigned int like configs.min_mem, 4096 * 1048576 would wrap to 0,
> which could silently disable the fallback logic.

Good catch, I will fix that in the next version. Thanks!

> > +#define TO_MB(s) (s / 1024 / 1024)
> > +
> > +struct config_local {
> > + unsigned long threshold;
> > + unsigned long interval;
> > + unsigned int init_order;
> > + unsigned int min_mem;
> [ ... ]
> > diff --git a/samples/bpf/mthp_ext.bpf.c b/samples/bpf/mthp_ext.bpf.c
> > new file mode 100644
> > index 000000000000..bbee3e9f679c
> > --- /dev/null
> > +++ b/samples/bpf/mthp_ext.bpf.c
> [ ... ]
> > +SEC("iter.s/cgroup")
> > +int cgroup_scan(struct bpf_iter__cgroup *ctx)
> > +{
> > + struct cgroup *cgrp = ctx->cgroup;
> > + struct mem_cgroup *memcg;
> > + struct mem_info *info;
> > + struct alert_event *e;
> > + unsigned long curr_stall;
> > + unsigned long curr_mem;
> > + unsigned long delta;
> > +
> > + if (!cgrp)
> > + return 1;
> > +
> > + if (!cgroup_has_tasks(cgrp))
> > + return 0;
> > +
> > + info = bpf_cgrp_storage_get(&cgrp_storage, cgrp, 0,
> > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > + if (!info)
> > + return 0;
> > +
> > + memcg = bpf_get_mem_cgroup(&cgrp->self);
> > + if (!memcg)
> > + return 0;
> > +
> > + bpf_cgroup_flush_stats(cgrp);
> > + curr_stall = bpf_cgroup_stall(cgrp, PSI_MEM_FULL);
> > + delta = curr_stall - info->stall;
>
> During the initial scan of a cgroup, info->stall will be zero-initialized.
> Since curr_stall represents cumulative stall time, won't delta be a very
> large number?
>
> This seems like it could incorrectly trigger the 4KB fallback during the
> first interval regardless of current memory pressure.

Good catch, This is a Bug exactly. I will fix that in the next version. Thanks!

> Copy from cover-letter:
> When the redis is under no memory pressure, RPS drops by 10.3%
> (from 1.4M to 1.2M, compared to always).

After fixed, when there is no memory pressure on Redis, RPS equal to
1.4M, which means mthp_ext is a positive optimization in all
scenarios.

> > + bpf_mem_cgroup_flush_stats(memcg);
> > + curr_mem = bpf_mem_cgroup_page_state(memcg, NR_ANON_MAPPED) +
> > + bpf_mem_cgroup_page_state(memcg, NR_SHMEM);
> > + if (curr_mem < FROM_MB(configs.min_mem) || delta >= configs.threshold)
> > + info->order = 0;
> > + else
> > + info->order = PMD_ORDER;
> [ ... ]
> > diff --git a/samples/bpf/mthp_ext.c b/samples/bpf/mthp_ext.c
> > new file mode 100644
> > index 000000000000..0e064bad136f
> > --- /dev/null
> > +++ b/samples/bpf/mthp_ext.c
> [ ... ]
> > +static int trigger_scan(struct bpf_link *iter_link)
> > +{
> > + char buf[256];
> > + int fd;
> > +
> > + fd = bpf_iter_create(bpf_link__fd(iter_link));
> > + if (fd < 0) {
> > + fprintf(stderr, "ERROR: bpf_iter_create failed: %s\n",
> > + strerror(errno));
> > + return -1;
> > + }
> > +
> > + /* Read to trigger the iter program execution */
> > + while (read(fd, buf, sizeof(buf)))
> > + ;
>
> If read() returns an error (like -1 for EINTR), will this result in an infinite
> busy-wait loop since -1 evaluates to true in C?
>
> Could this be updated to check for > 0 instead?

Yes, I will fix that in the next version. Thanks!

> > +
> > + close(fd);
> > + return 0;
> > +}
> [ ... ]
> > +int main(int argc, char **argv)
> > +{
> [ ... ]
> > + psi_fd = setup_psi_trigger(root_path, "some", threshold * 1000,
> > + interval * 1000);
> > + if (psi_fd < 0) {
> > + fprintf(stderr, "ERROR: PSI trigger setup failed\n");
> > + goto psi_setup_fail;
> > + }
> > +
> > + monitor_thread(psi_fd, &skel->bss->configs, iter_link, rb);
> > +
> > + close(psi_fd);
> > +psi_setup_fail:
> > + ring_buffer__free(rb);
> > +rb_fail:
> > + bpf_link__destroy(iter_link);
> > +exit_fixed:
> > +attach_iter_fail:
> > + bpf_link__destroy(ops_link);
> > +attach_opts_fail:
> > +load_skel_fail:
> > + mthp_ext__destroy(skel);
> > +open_skel_fail:
> > + close(root_fd);
> > +
> > + printf("\nExiting...\n");
> > +
> > + return err;
> > +}
>
> If setup_psi_trigger() fails, the code jumps to psi_setup_fail. However, err
> was previously set to 0 from a successful BPF operation.
>
> Does this cause the program to log an error but exit with a success status
> code?

Yes, I will fix that in the next version. Thanks!

> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260503165024.1526680-1-vernon2gm@xxxxxxxxx?part=4