Re: [PATCH v2 2/2] cgroup/dmem: add dmem.memcg control file for double-charging to memcg

From: Maarten Lankhorst

Date: Fri Jun 05 2026 - 12:05:39 EST


Hey,

On 6/5/26 17:42, Eric Chanudet wrote:
> On Fri, Jun 05, 2026 at 01:27:09PM +0200, Maarten Lankhorst wrote:
>> Hey,
>>
>> On 5/26/26 18:59, Eric Chanudet wrote:
>>> On Fri, May 22, 2026 at 05:26:16PM +0200, Michal Koutný wrote:
>>>> Hello Eric.
>>>>
>>>> On Tue, May 19, 2026 at 11:59:02AM -0400, Eric Chanudet <echanude@xxxxxxxxxx> wrote:
>>>>> Add a root-only cgroupfs file "dmem.memcg" that lets an administrator
>>>>> configure whether allocations in a dmem region should also be charged to
>>>>> the memory controller.
>>>>
>>>> This kinda makes sense as it is not unlike io.cost.* device
>>>> configurators.
>>>>
>>>> Just for my better understanding -- will there be a space for userspace
>>>> to switch this? (No charged dmem allocations happen before responsible
>>>> userspace runs, so that the attribute remains unlocked.)
>>>>
>>>> (I'm rather indifferent about the actual double charging/non-charging
>>>> matter.)
>>>
>>> Yes, this is intended to be configured before the user space stack that
>>> would start allocating things is started. Once it has started (and tried
>>> to charge something), the configuration is locked
>>>
>>>>
>>>>>
>>>>> To handle inheritance, dmem adds a depends_on the memory controller,
>>>>> unless MEMCG isn't configured in.
>>>>>
>>>>> Double-charging is disabled by default. Once a charge is attempted, the
>>>>> setting is locked to prevent inconsistent accounting by a small 4-state
>>>>> machine (off, on, locked off, locked on).
>>>>>
>>>>> The memcg to charge is derived from the pool's cgroup, since the pool
>>>>> holds a reference to the dmem cgroup state that keeps the cgroup alive
>>>>> until it gets uncharged.
>>>>>
>>>>> Signed-off-by: Eric Chanudet <echanude@xxxxxxxxxx>
>>>>> ---
>>>>> Documentation/admin-guide/cgroup-v2.rst | 23 +++++
>>>>> kernel/cgroup/dmem.c | 158 +++++++++++++++++++++++++++++++-
>>>>> 2 files changed, 178 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>>>> index 6efd0095ed995b1550317662bc1b56c7a7f3db23..1d2fa55ddf0faa17baa916a8914d3033e8e42359 100644
>>>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>>>> @@ -2828,6 +2828,29 @@ DMEM Interface Files
>>>>> drm/0000:03:00.0/vram0 12550144
>>>>> drm/0000:03:00.0/stolen 8650752
>>>>>
>>>>> + dmem.memcg
>>>>> + A readwrite nested-keyed file that exists only on the root
>>>>> + cgroup.
>>>>
>>>> Strictly speaking this is not nested-keyed but flat keyed [1],
>>>
>>> Indeed,
>>>
>>>> which leads me to realization that this is the first instance of a boolean.
>>>> All in call, such a composition comes to my mind (latter is RO):
>>>>
>>>> drm/0000:03:00.0/vram0 enable=0|1 locked=0|1
>>>>
>>>
>>> So per[1] 1 key, 2 sub-keys (enable RW, locked RO), that looks better
>>> and match the documentation, thanks!
>>>
>>>>
>>>>
>>>>> +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf,
>>>>> + size_t nbytes, loff_t off)
>>>>> +{
>>>>> + while (buf) {
>>>>> + struct dmem_cgroup_region *region;
>>>>> + char *options, *name;
>>>>> + bool flag;
>>>>> +
>>>>> + options = buf;
>>>>> + buf = strchr(buf, '\n');
>>>>> + if (buf)
>>>>> + *buf++ = '\0';
>>>>
>>>> I recall there was a discussion about accepting only a single device per
>>>> write(2) (at the same time I see this idiom is still present in other
>>>> dmem.* files, so this is nothing to change in _this_ patch).
>>>
>>> I would second that. When setting say dmem.max for 2 regions, with a
>>> typo on the second, the first one is set, but write still get EINVAL.
>>>
>>> Also, I just notice dmemcg_limit_write() returns EINVAL if the region is
>>> not found (this patch returns ENODEV).
>>>
>>>>
>>>> Thanks,
>>>> Michal
>>>>
>>>> [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#format
>>>
>>>
>>>
>>
>> Perhaps a bit late, but before we start adding this UAPI we should enforce a
>> single region per write?
>
> I can send that separately, although that is a UAPI change. Is there any
> user that would be affected?
>
> This series is hung on charging memcg using memory objects from the
> context of dmem, when at that level of abstraction it doesn't have
> access to the underlying pieces that were allocated.

It's a uapi change, but I see more and more interest in the development and usage of dmemcg.
I believe it's better to fix this before users (perhaps accidentally) start to rely on this behavior.

Kind regards,
~Maarten Lankhorst