Re: [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent

From: Yafang Shao

Date: Thu May 07 2026 - 09:19:35 EST


On Thu, May 7, 2026 at 8:51 PM Vernon Yang <vernon2gm@xxxxxxxxx> wrote:
>
> On Thu, May 7, 2026 at 11:35 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > On Mon, May 4, 2026 at 12:52 AM Vernon Yang <vernon2gm@xxxxxxxxx> wrote:
> > >
> > > From: Vernon Yang <yanglincheng@xxxxxxxxxx>
> > >
> > > Hi all,
> > >
> > > Background
> > > ==========
> > >
> > > As is well known, a system can simultaneously run multiple different
> > > scenarios. However, THP is not beneficial in every scenario — it is only
> > > most suitable for memory-intensive applications that are not sensitive
> > > to tail latency. For example, Redis, which is sensitive to tail latency,
> > > is not suitable for THP. But in practice, due to Redis issues, the
> > > entire THP functionality is often turned off, preventing other scenarios
> > > from benefiting from it.
> > >
> > > There are also some embedded scenarios (e.g. Android) that directly use
> > > 2MB THP, where the granularity is too large. Therefore, we introduced
> > > mTHP in v6.8, which supports multiple-size THP. In practice, however, we
> > > still globally fix a single mTHP size and are unable to automatically
> > > select different mTHP sizes based on different scenarios.
> > >
> > > After testing, it was found that
> > >
> > > - When the system has a lot of free memory, it is normal for Redis to
> > > use mTHP. performance degradation in Redis only occurs when the system
> > > is under high memory pressure.
> > > - Additionally, when a large number of small-memory processes use mTHP,
> > > memory waste is prone to occur, and performance degradation may also
> > > happen during fast memory allocation/release.
> > >
> > > Previously, "Cgroup-based THP control"[1] was proposed, but it had the
> > > following issues.
> > >
> > > - It breaks the cgroup hierarchy property.
> > > - Add new THP knobs, making sysadmin's job more complex
> > >
> > > Previously, "mm, bpf: BPF-MM, BPF-THP"[2] was proposed, but it had the
> > > following issues.
> > >
> > > - It didn't address the issue on the per-process mode.
> > > - For global mode, the prctl(PR_SET_THP_DISABLE) has already achieved
> > > the same objective, there is no need to add two mechanisms for the
> > > same purpose.
> > > - Attaching st_ops to mm_struct, the same issues that cgroup-bpf once
> >
> > Hello,
> >
> > The primary hurdles preventing BPF-THP from being upstreamed are:
> >
> > - Uncertainty regarding the stable API
> >
> > It remains unclear whether the following API is sufficient for
> > BPF-THP requirements:
>
> Thank you for pointing this out. I will add it to this issue list in
> the next version.
>
> At the same time, I also encourage everyone to actively provide
> relevant real workload scenarios. I will conduct related analyses and
> develop the mthp_ext to address them, so we can determine which ABI
> can satisfy the BPF-THP requirements.
>
> > unsigned long
> > bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type,
> > unsigned long orders)
> > {
> > return orders;
> > }
> >
> > - Ongoing integration of cgroups and struct-ops
> >
> > Work is still in progress to integrate cgroup support with
> > struct_ops (see also
> > https://lore.kernel.org/linux-mm/87cy439x8a.fsf@xxxxxxxxx/).
> > We should wait for this infrastructure to land before introducing
> > new cgroup-based struct_ops.
>
> This work can be carried out in parallel with integrating cgroup
> support with struct_ops. I will focus on addressing real workload
> scenarios to further clear/stabilize the BPF-THP ABI.
>
> > > faced are likely to arise again, e.g. lifetime of cgroup vs bpf, dying
> > > cgroups, wq deadlock, etc. It is recommended to use cgroup-bpf for
> > > implementation.
> > > - The test cases are too simplistic, lacking eBPF cases similar to real
> > > workloads such as sched_ext.
> > >
> > > If I miss some thing, please let me know. Thanks!
> >
> > BTW, it would be better to include the original authors in the CC
> > list, especially since their work is cited in your commit message. ;)
>
> OK, I will CC the relevant authors in the next version.

Hello Vernon,

I believe it would be best to hold off until David provides guidance
on the future direction. While I am not currently active on BPF-THP,
we are still looking for the right opportunity to upstream it. The
primary difference is that your implementation is cgroup-based;
however, we also plan to switch to that approach once Roman’s work
lands.

I don't mean to imply that BPF-THP is solely my project, but I suspect
you will eventually arrive at a similar implementation to what I’ve
developed. So far, I haven’t found a more efficient API than the
following:

unsigned long
bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type,
unsigned long orders)
{
return orders;
}

--
Regards
Yafang