On Tue 19-06-18 17:31:27, Nadav Amit wrote:
at 4:08 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:Yes, this is true but I guess what Yang Shi meant was that an userspace
Right. I see it now.
On 6/19/18 3:17 PM, Nadav Amit wrote:
at 4:34 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>Not exactly the same. So, I basically copied the page zapping used by munmap instead of calling MADV_DONTNEED.
wrote:
When running some mmap/munmap scalability tests with large memory (i.e.(snip)
300GB), the below hung task issue may happen occasionally.INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ps D 0 14018 1 0x00000004
Zapping pages is the most time consuming part, according to theDoes munmap() == MADV_DONTNEED + munmap() ?
suggestion from Michal Hock [1], zapping pages can be done with holding
read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
mmap_sem to manipulate vmas.
For example, what happens with userfaultfd in this case? Can you get anuserfaultfd is handled by regular munmap path. So, no change to userfaultfd part.
extra #PF, which would be visible to userspace, before the munmap is
finished?
Thanks for the reference.In addition, would it be ok for the user to potentially get a zeroed page inThis should be undefined behavior according to Michal. This has been discussed in https://lwn.net/Articles/753269/.
the time window after the MADV_DONTNEED finished removing a PTE and before
the munmap() is done?
Reading the man page I see: "All pages containing a part of the indicated
range are unmapped, and subsequent references to these pages will generate
SIGSEGV.â
access racing with munmap is not well defined. You never know whether
you get your data, #PTF or SEGV because it depends on timing. The user
visible change might be that you lose content and get zero page instead
if you hit the race window while we are unmapping which was not possible
before. But whouldn't such an access pattern be buggy anyway? You need
some form of external synchronization AFAICS.
But maybe some userspace depends on "getting right data or get SEGV"
semantic. If we have to preserve that then we can come up with a VM_DEAD
flag set before we tear it down and force the SEGV on the #PF path.
Something similar we already do for MMF_UNSTABLE.