Re: [PATCH v3] cgroup/dmem: allow max to be set below current usage

From: Maarten Lankhorst

Date: Thu Apr 16 2026 - 09:01:52 EST


Getm

Den 2026-03-26 kl. 16:18, skrev Thadeu Lima de Souza Cascardo:
> page_counter_set_max may return -EBUSY in case the current usage is above
> the new max. When writing to dmem.max, this error is ignored and the new
> max is not set.
>
> Instead of using page_counter_set_max when writing to dmem.max, atomically
> update its value irrespective of the current usage.
>
> Since there is no current mechanism to evict a given dmemcg pool, this will
> at least prevent the current usage from growing any further.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@xxxxxxxxxx>
> ---
> When writing to dmem.max, it was noticed that some writes did not take
> effect, even though the write was successful.
>
> It turns out that page_counter_set_max checks if the new max value is above
> the current usage and returns -EBUSY in case it is, which was being ignored
> by dmemcg_limit_write.
>
> It was also noticed that when setting limits for multiple regions in a
> single write, while setting one region's limit may fail, others might have
> succeeded before. Tejun Heo brought up that this breaks atomicity.
>
> Maarten Lankhorst and Michal Koutný have brought up that instead of
> failing, setting max below the current usage should behave like memcg and
> start eviction until usage goes below the new max.
>
> However, since there is no current mechanism to evict a given region, they
> suggest that setting the new max will at least prevent further allocations.
>
> v1 kept the multiple region support when writing to limit files, while
> returning -EBUSY as soon as setting a region's max was above the usage.
>
> v2 only allows setting a single region limit per write, while allowing any
> new max to be set.
>
> This version (v3) still allows multiple regions to be set, and explains why
> page_counter_set_max is not used anymore.
>
> I am sending this version dropping the multiple region restriction for now,
> as we continue to discuss whether it should be supported or not.
> ---
> Changes in v3:
> - Dropped first patch as it was already applied.
> - Added comment explaining why page_counter_set_max is not used.
> - Dropped patch restricting multiple regions to be set for now.
> - Link to v2: https://lore.kernel.org/r/20260319-dmem_max_ebusy-v2-0-b5ce97205269@xxxxxxxxxx
>
> Changes in v2:
> - Remove support for setting multiple regions' limits.
> - Allow any new max limit to be set.
> - Link to v1: https://lore.kernel.org/r/20260318-dmem_max_ebusy-v1-1-b7e461157b29@xxxxxxxxxx
> ---
> kernel/cgroup/dmem.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 1ab1fb47f2711ecc60dd13e611a8a4920b48f3e9..c00aa06759967a8f8977aaf036dd7966ddb55718 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -159,7 +159,15 @@ set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
> static void
> set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
> {
> - page_counter_set_max(&pool->cnt, val);
> + /*
> + * page_counter_set_max will return -EBUSY in case the current
> + * usage is above the new max.
> + *
> + * Since, there is no current eviction mechanism yet, setting max
> + * irrespective of the current usage will prevent further
> + * allocations.
> + */
> + xchg(&pool->cnt.max, val);
> }
>
> static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)

I haven't seen a reply yet, so do you want me to commit this now to drm-misc-next?

Kind regards,
~Maarten Lankhorst