Re: [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED)

From: David Hildenbrand
Date: Mon Sep 27 2021 - 06:58:59 EST

On 27.09.21 12:41, Nadav Amit wrote:

On Sep 27, 2021, at 2:24 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:

On 26.09.21 18:12, Nadav Amit wrote:
From: Nadav Amit <namit@xxxxxxxxxx>
The goal of these patches is to add support for
process_madvise(MADV_DONTNEED). Yet, in the process some (arguably)
useful cleanups, a bug fix and performance enhancements are performed.
The patches try to consolidate the logic across different behaviors, and
to a certain extent overlap/conflict with an outstanding patch that does
something similar [1]. This consolidation however is mostly orthogonal
to the aforementioned one and done in order to clarify what is done in
respect to locks and TLB for each behavior and to batch these operations
more efficiently on process_madvise().
process_madvise(MADV_DONTNEED) is useful for two reasons: (a) it allows
userfaultfd monitors to unmap memory from monitored processes; and (b)
it is more efficient than madvise() since it is vectored and batches TLB
flushes more aggressively.

MADV_DONTNEED on MAP_PRIVATE memory is a target-visible operation; this is very different to all the other process_madvise() calls we allow, which are merely hints, but the target cannot be broken . I don't think this is acceptable.

This is a fair point, which I expected, but did not address properly.

I guess an additional capability, such as CAP_SYS_PTRACE needs to be
required in this case. Would that ease your mind?

I think it would be slightly better, but I'm still missing a clear use case that justifies messing with the page tables of other processes in that way, especially with MAP_PRIVATE mappings. Can you maybe elaborate a bit on a) and b)?

Especially, why would a) make sense or be required? When would it be a good idea to zap random pages of a target process, especially with MAP_PRIVATE? How would the target use case make sure that the target process doesn't suddenly lose data? I would have assume that you can really only do something sane with uffd() if 1) the process decided to give up on some pages (madvise(DONTNEED)) b) the process hasn't touched these pages yet.

Can you also comment a bit more on b)? Who cares about that? And would we suddenly expect users of madvise() to switch to process_madvise() because it's more effective? It sounds a bit weird to me TBH, but most probably I am missing details :)


David / dhildenb