Re: [PATCH] mm: Free per cpu pages async to shorten program exit time

From: David Hildenbrand
Date: Fri Oct 08 2021 - 05:24:22 EST


On 08.10.21 11:22, Claudio Imbrenda wrote:
On Fri, 8 Oct 2021 11:15:25 +0200
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 08.10.21 10:52, Claudio Imbrenda wrote:
On Fri, 8 Oct 2021 10:17:50 +0200
David Hildenbrand <david@xxxxxxxxxx> wrote:
On 08.10.21 08:39, ultrachin@xxxxxxx wrote:
From: chen xiaoguang <xiaoggchen@xxxxxxxxxxx>

The exit time is long when program allocated big memory and
the most time consuming part is free memory which takes 99.9%
of the total exit time. By using async free we can save 25% of
exit time.

Signed-off-by: chen xiaoguang <xiaoggchen@xxxxxxxxxxx>
Signed-off-by: zeng jingxiang <linuszeng@xxxxxxxxxxx>
Signed-off-by: lu yihui <yihuilu@xxxxxxxxxxx>

I recently discussed with Claudio if it would be possible to tear down
the process MM deferred, because for some use cases (secure/encrypted
virtualization, very large mmaps) tearing down the page tables is
already the much more expensive operation.

There is mmdrop_async(), and I wondered if one could reuse that concept
when tearing down a process -- I didn't look into feasibility, however,
so it's just some very rough idea.

I have done some experiments by unconditionally replacing mmdrop with
mmdrop_async in exit.c and nothing broke, and exit time of large
processes was almost instant (with the actual cleanup being performed in
background)

my approach is probably simpler/cleaner:

diff --git a/include/asm-generic/mmu_context.h b/include/asm-generic/mmu_context.h
index 91727065bacb..900931a6a105 100644
--- a/include/asm-generic/mmu_context.h
+++ b/include/asm-generic/mmu_context.h
@@ -73,4 +73,8 @@ static inline void deactivate_mm(struct task_struct *tsk,
}
#endif
+#ifndef arch_exit_mm_mmput
+#define arch_exit_mm_mmput mmput
+#endif
+
#endif /* __ASM_GENERIC_MMU_CONTEXT_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 9a89e7f36acb..604cb9c740fa 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -498,7 +498,7 @@ static void exit_mm(void)
task_unlock(current);
mmap_read_unlock(mm);
mm_update_next_owner(mm);
- mmput(mm);
+ arch_exit_mm_mmput(mm);
if (test_thread_flag(TIF_MEMDIE))
exit_oom_victim();
}

these are the minimal changes to common code, then each architecture can
define their own arch_exit_mm_mmput as they see fit (for example, to free
asynchronously only for certain classes of mm, like big ones, VMs, or so).

Another option is to simply always replace mmput with mmput_async, which I
expect will raise more eyebrows.

Thanks Claudio.

I guess we'd use some heuristic to keep the eyebrows down. Having
something like

if (should_mput_async_on_exit(mm))
mmput_async(mm);
else
mmput(mm);

whereby the heuristic can optionally consult the arch/config-knobs/...
doesn't sound too wrong to me if it works.


yes, that is one of the possible solutions I had thought of.

although probably the small patch I posted above is even less intrusive
and should hopefully raise even fewer eyebrows, while also leaving the
door open to arch-specific code to do more than just mmput_async, if
needed.

More flexibility might raise more eyebrows. :)

--
Thanks,

David / dhildenb