Re: [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent

From: Vernon Yang

Date: Thu May 07 2026 - 08:54:48 EST

On Thu, May 7, 2026 at 11:35 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Mon, May 4, 2026 at 12:52 AM Vernon Yang <vernon2gm@xxxxxxxxx> wrote:
> >
> > From: Vernon Yang <yanglincheng@xxxxxxxxxx>
> >
> > Hi all,
> >
> > Background
> > ==========
> >
> > As is well known, a system can simultaneously run multiple different
> > scenarios. However, THP is not beneficial in every scenario — it is only
> > most suitable for memory-intensive applications that are not sensitive
> > to tail latency. For example, Redis, which is sensitive to tail latency,
> > is not suitable for THP. But in practice, due to Redis issues, the
> > entire THP functionality is often turned off, preventing other scenarios
> > from benefiting from it.
> >
> > There are also some embedded scenarios (e.g. Android) that directly use
> > 2MB THP, where the granularity is too large. Therefore, we introduced
> > mTHP in v6.8, which supports multiple-size THP. In practice, however, we
> > still globally fix a single mTHP size and are unable to automatically
> > select different mTHP sizes based on different scenarios.
> >
> > After testing, it was found that
> >
> > - When the system has a lot of free memory, it is normal for Redis to
> > use mTHP. performance degradation in Redis only occurs when the system
> > is under high memory pressure.
> > - Additionally, when a large number of small-memory processes use mTHP,
> > memory waste is prone to occur, and performance degradation may also
> > happen during fast memory allocation/release.
> >
> > Previously, "Cgroup-based THP control"[1] was proposed, but it had the
> > following issues.
> >
> > - It breaks the cgroup hierarchy property.
> > - Add new THP knobs, making sysadmin's job more complex
> >
> > Previously, "mm, bpf: BPF-MM, BPF-THP"[2] was proposed, but it had the
> > following issues.
> >
> > - It didn't address the issue on the per-process mode.
> > - For global mode, the prctl(PR_SET_THP_DISABLE) has already achieved
> > the same objective, there is no need to add two mechanisms for the
> > same purpose.
> > - Attaching st_ops to mm_struct, the same issues that cgroup-bpf once
>
> Hello,
>
> The primary hurdles preventing BPF-THP from being upstreamed are:
>
> - Uncertainty regarding the stable API
>
> It remains unclear whether the following API is sufficient for
> BPF-THP requirements:

Thank you for pointing this out. I will add it to this issue list in
the next version.

At the same time, I also encourage everyone to actively provide
relevant real workload scenarios. I will conduct related analyses and
develop the mthp_ext to address them, so we can determine which ABI
can satisfy the BPF-THP requirements.

> unsigned long
> bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type,
> unsigned long orders)
> {
> return orders;
> }
>
> - Ongoing integration of cgroups and struct-ops
>
> Work is still in progress to integrate cgroup support with
> struct_ops (see also
> https://lore.kernel.org/linux-mm/87cy439x8a.fsf@xxxxxxxxx/).
> We should wait for this infrastructure to land before introducing
> new cgroup-based struct_ops.

This work can be carried out in parallel with integrating cgroup
support with struct_ops. I will focus on addressing real workload
scenarios to further clear/stabilize the BPF-THP ABI.

> > faced are likely to arise again, e.g. lifetime of cgroup vs bpf, dying
> > cgroups, wq deadlock, etc. It is recommended to use cgroup-bpf for
> > implementation.
> > - The test cases are too simplistic, lacking eBPF cases similar to real
> > workloads such as sched_ext.
> >
> > If I miss some thing, please let me know. Thanks!
>
> BTW, it would be better to include the original authors in the CC
> list, especially since their work is cited in your commit message. ;)

OK, I will CC the relevant authors in the next version.

--
Cheers,
Vernon