Re: [PATCH v3 07/12] mm, swap: support flexible batch freeing of slots in different memcgs
From: Chris Li
Date: Fri May 08 2026 - 00:01:47 EST
On Tue, Apr 21, 2026 at 7:16 AM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@xxxxxxxxxx> wrote:
>
> From: Kairui Song <kasong@xxxxxxxxxxx>
>
> Instead of requiring the caller to ensure all slots are in the same
> memcg, make the function handle different memcgs at once.
>
> This is both a micro optimization and required for removing the memcg
> lookup in the page table layer, so it can be unified at the swap layer.
>
> We are not removing the memcg lookup in the page table in this commit.
> It has to be done after the memcg lookup is deferred to the swap layer.
>
> Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
Overall, it looks good. Some nitpicks follow.
Acked-by: Chris Li <chrisl@xxxxxxxxxx>
> ---
> mm/swapfile.c | 33 +++++++++++++++++++++++++++++----
> 1 file changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index e1ad77a69e54..8d3d22c463f3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1872,21 +1872,46 @@ void __swap_cluster_free_entries(struct swap_info_struct *si,
> unsigned int ci_start, unsigned int nr_pages)
> {
> unsigned long old_tb;
> + unsigned int type = si->type;
> + unsigned short id = 0, id_cur;
Nitpick: I'm tempted to rename a few variables to improve my
understanding. Feel free to keep it as it is.
id -> batch_id
> unsigned int ci_off = ci_start, ci_end = ci_start + nr_pages;
> - unsigned long offset = cluster_offset(si, ci) + ci_start;
> + unsigned long offset = cluster_offset(si, ci);
Nitpick: offset -> ci_offset. This is the fixed offset of the ci which
is a fixed in the loop.
> + unsigned int ci_batch = ci_off;
Nitpick: ci_batch -> batch_off, this one go with the batch_id.
> + swp_entry_t entry;
>
> VM_WARN_ON(ci->count < nr_pages);
>
> ci->count -= nr_pages;
> do {
> old_tb = __swap_table_get(ci, ci_off);
> - /* Release the last ref, or after swap cache is dropped */
> + /*
> + * Freeing is done after release of the last swap count
> + * ref, or after swap cache is dropped
> + */
> VM_WARN_ON(!swp_tb_is_shadow(old_tb) || __swp_tb_get_count(old_tb) > 1);
> __swap_table_set(ci, ci_off, null_to_swp_tb());
> +
> + /*
> + * Uncharge swap slots by memcg in batches. Consecutive
> + * slots with the same cgroup id are uncharged together.
> + */
> + entry = swp_entry(type, offset + ci_off);
Nitpick: This line confused me a bit. Two offsets are mentioned here:
"offset + ci_offset". One would assume that ci_offset is the offset of
the ci, and the offset is the incremental one. It is the other way
around.
> + id_cur = lookup_swap_cgroup_id(entry);
> + if (id != id_cur) {
> + if (id)
> + mem_cgroup_uncharge_swap(swp_entry(type, offset + ci_batch),
> + ci_off - ci_batch);
With the above rename, this become:
"... swp_entry(type, ci_offset + batch_off)," ; This combined the
offset turn into the swap entry.
"ci_off - batch_off". That is the running length from the beginning of batch.
> + id = id_cur;
> + ci_batch = ci_off;
> + }
> } while (++ci_off < ci_end);
>
> - mem_cgroup_uncharge_swap(swp_entry(si->type, offset), nr_pages);
> - swap_range_free(si, offset, nr_pages);
> + if (id) {
This becomes `if (batch_id)`, meaning if we have pending batching, we
flush the current batch.
Chris