Re: [PATCH 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent

From: Yafang Shao

Date: Wed May 06 2026 - 23:36:28 EST


On Mon, May 4, 2026 at 12:52 AM Vernon Yang <vernon2gm@xxxxxxxxx> wrote:
>
> From: Vernon Yang <yanglincheng@xxxxxxxxxx>
>
> Hi all,
>
> Background
> ==========
>
> As is well known, a system can simultaneously run multiple different
> scenarios. However, THP is not beneficial in every scenario — it is only
> most suitable for memory-intensive applications that are not sensitive
> to tail latency. For example, Redis, which is sensitive to tail latency,
> is not suitable for THP. But in practice, due to Redis issues, the
> entire THP functionality is often turned off, preventing other scenarios
> from benefiting from it.
>
> There are also some embedded scenarios (e.g. Android) that directly use
> 2MB THP, where the granularity is too large. Therefore, we introduced
> mTHP in v6.8, which supports multiple-size THP. In practice, however, we
> still globally fix a single mTHP size and are unable to automatically
> select different mTHP sizes based on different scenarios.
>
> After testing, it was found that
>
> - When the system has a lot of free memory, it is normal for Redis to
> use mTHP. performance degradation in Redis only occurs when the system
> is under high memory pressure.
> - Additionally, when a large number of small-memory processes use mTHP,
> memory waste is prone to occur, and performance degradation may also
> happen during fast memory allocation/release.
>
> Previously, "Cgroup-based THP control"[1] was proposed, but it had the
> following issues.
>
> - It breaks the cgroup hierarchy property.
> - Add new THP knobs, making sysadmin's job more complex
>
> Previously, "mm, bpf: BPF-MM, BPF-THP"[2] was proposed, but it had the
> following issues.
>
> - It didn't address the issue on the per-process mode.
> - For global mode, the prctl(PR_SET_THP_DISABLE) has already achieved
> the same objective, there is no need to add two mechanisms for the
> same purpose.
> - Attaching st_ops to mm_struct, the same issues that cgroup-bpf once

Hello,

The primary hurdles preventing BPF-THP from being upstreamed are:

- Uncertainty regarding the stable API

It remains unclear whether the following API is sufficient for
BPF-THP requirements:

unsigned long
bpf_hook_thp_get_orders(struct vm_area_struct *vma, enum tva_type type,
unsigned long orders)
{
return orders;
}

- Ongoing integration of cgroups and struct-ops

Work is still in progress to integrate cgroup support with
struct_ops (see also
https://lore.kernel.org/linux-mm/87cy439x8a.fsf@xxxxxxxxx/).
We should wait for this infrastructure to land before introducing
new cgroup-based struct_ops.

> faced are likely to arise again, e.g. lifetime of cgroup vs bpf, dying
> cgroups, wq deadlock, etc. It is recommended to use cgroup-bpf for
> implementation.
> - The test cases are too simplistic, lacking eBPF cases similar to real
> workloads such as sched_ext.
>
> If I miss some thing, please let me know. Thanks!

BTW, it would be better to include the original authors in the CC
list, especially since their work is cited in your commit message. ;)

--
Regards

Yafang