Re: [PATCH 5/5] mm: memcg: separate slab stat accounting from objcg charge cache

From: Vlastimil Babka (SUSE)

Date: Tue Mar 03 2026 - 05:46:28 EST

On 3/3/26 09:54, Hao Li wrote:
> On Mon, Mar 02, 2026 at 02:50:18PM -0500, Johannes Weiner wrote:
>> Cgroup slab metrics are cached per-cpu the same way as the sub-page
>> charge cache. However, the intertwined code to manage those dependent
>> caches right now is quite difficult to follow.
>>
>> Specifically, cached slab stat updates occur in consume() if there was
>> enough charge cache to satisfy the new object. If that fails, whole
>> pages are reserved, and slab stats are updated when the remainder of
>> those pages, after subtracting the size of the new slab object, are
>> put into the charge cache. This already juggles a delicate mix of the
>> object size, the page charge size, and the remainder to put into the
>> byte cache. Doing slab accounting in this path as well is fragile, and
>> has recently caused a bug where the input parameters between the two
>> caches were mixed up.
>>
>> Refactor the consume() and refill() paths into unlocked and locked
>> variants that only do charge caching. Then let the slab path manage
>> its own lock section and open-code charging and accounting.
>>
>> This makes the slab stat cache subordinate to the charge cache:
>> __refill_obj_stock() is called first to prepare it;
>> __account_obj_stock() follows to hitch a ride.
>>
>> This results in a minor behavioral change: previously, a mismatching
>> percpu stock would always be drained for the purpose of setting up
>> slab account caching, even if there was no byte remainder to put into
>> the charge cache. Now, the stock is left alone, and slab accounting
>> takes the uncached path if there is a mismatch. This is exceedingly
>> rare, and it was probably never worth draining the whole stock just to
>> cache the slab stat update.
>>
>> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>> ---
>> mm/memcontrol.c | 100 +++++++++++++++++++++++++++++-------------------
>> 1 file changed, 61 insertions(+), 39 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 4f12b75743d4..9c6f9849b717 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3218,16 +3218,18 @@ static struct obj_stock_pcp *trylock_stock(void)
>>
>
> [...]
>
>> @@ -3376,17 +3383,14 @@ static bool obj_stock_flush_required(struct obj_stock_pcp *stock,
>> return flush;
>> }
>>
>> -static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>> - bool allow_uncharge, int nr_acct, struct pglist_data *pgdat,
>> - enum node_stat_item idx)
>> +static void __refill_obj_stock(struct obj_cgroup *objcg,
>> + struct obj_stock_pcp *stock,
>> + unsigned int nr_bytes,
>> + bool allow_uncharge)
>> {
>> - struct obj_stock_pcp *stock;
>> unsigned int nr_pages = 0;
>>
>> - stock = trylock_stock();
>> if (!stock) {
>> - if (pgdat)
>> - __account_obj_stock(objcg, NULL, nr_acct, pgdat, idx);
>> nr_pages = nr_bytes >> PAGE_SHIFT;
>> nr_bytes = nr_bytes & (PAGE_SIZE - 1);
>> atomic_add(nr_bytes, &objcg->nr_charged_bytes);
>> @@ -3404,20 +3408,25 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>> }
>> stock->nr_bytes += nr_bytes;
>>
>> - if (pgdat)
>> - __account_obj_stock(objcg, stock, nr_acct, pgdat, idx);
>> -
>> if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
>> nr_pages = stock->nr_bytes >> PAGE_SHIFT;
>> stock->nr_bytes &= (PAGE_SIZE - 1);
>> }
>>
>> - unlock_stock(stock);
>> out:
>> if (nr_pages)
>> obj_cgroup_uncharge_pages(objcg, nr_pages);
>> }
>>
>> +static void refill_obj_stock(struct obj_cgroup *objcg,
>> + unsigned int nr_bytes,
>> + bool allow_uncharge)
>> +{
>> + struct obj_stock_pcp *stock = trylock_stock();
>> + __refill_obj_stock(objcg, stock, nr_bytes, allow_uncharge);
>> + unlock_stock(stock);
>
> Hi Johannes,
>
> I noticed that after this patch, obj_cgroup_uncharge_pages() is now inside
> the obj_stock.lock critical section. Since obj_cgroup_uncharge_pages() calls
> refill_stock(), which seems non-trivial, this might increase the lock hold time.
> In particular, could that lead to more failed trylocks for IRQ handlers on
> non-RT kernel (or for tasks that preempt others on RT kernel)?

Yes, it also seems a bit self-defeating? (at least in theory)

refill_obj_stock()
trylock_stock()
__refill_obj_stock()
obj_cgroup_uncharge_pages()
refill_stock()
local_trylock() -> nested, will fail