Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping

From: Nadav Amit
Date: Tue Jun 19 2018 - 18:17:27 EST


at 4:34 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:

> When running some mmap/munmap scalability tests with large memory (i.e.
>> 300GB), the below hung task issue may happen occasionally.
>
> INFO: task ps:14018 blocked for more than 120 seconds.
> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> ps D 0 14018 1 0x00000004
>
(snip)

>
> Zapping pages is the most time consuming part, according to the
> suggestion from Michal Hock [1], zapping pages can be done with holding
> read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
> mmap_sem to manipulate vmas.

Does munmap() == MADV_DONTNEED + munmap() ?

For example, what happens with userfaultfd in this case? Can you get an
extra #PF, which would be visible to userspace, before the munmap is
finished?

In addition, would it be ok for the user to potentially get a zeroed page in
the time window after the MADV_DONTNEED finished removing a PTE and before
the munmap() is done?

Regards,
Nadav