On Thu, Feb 16, 2023 at 06:00:51PM +0100, David Hildenbrand wrote:
There are various reasons why I think a UFFD_FEATURE_WP_UNPOPULATED, using
PTE markers, would be more benficial:
1) It would be applicable to anon hugetlb
Anon hugetlb should already work with non ptes with the markers?
... really? I thought we'd do the whole pte marker handling only when
dealing with hugetlb/shmem. Interesting, thanks. (we could skip population
in QEMU in that case as well -- we always do it for now)
Hmm, you're talking about "anon hugetlb", so it's still hugetlb, right? :)
2) It would be applicable even when the zeropage is disallowed
(mm_forbids_zeropage())
Do you mean s390 can disable zeropage with mm_uses_skeys()? So far uffd-wp
doesn't support s390 yet, I'm not sure whether we over worried on this
effect.
Or is there any other projects / ideas that potentially can enlarge forbid
zero pages to more contexts?
I think it was shown that zeropages can be used to build covert channels
(similar to memory deduplciation, because it effectively is memory
deduplication). It's mentioned as a note in [1] under VII. A. ("Only
Deduplicate Zero Pages.")
[1] https://www.ndss-symposium.org/wp-content/uploads/2022-81-paper.pdf
Thanks for the link. I'm slightly confused how dedup of zero pages is a
concern here, though.
IIUC the security risk is when the dedup-ed pages contain valid information
so the attacker can measure latency of requests when the attemped malicious
page contains exactly the same content of the data page, by trying to
detect the CoW from happening. >
Here it's the zero page, even if there's CoW difference the data being
exposed can only be all zeros? Then what's the risk?
Another note for s390: when it comes we can consider moving to pte markers
conditionally when !zeropage. But we can leave that for later.
3) It would be possible to optimize even without the huge zeropage, by
using a PMD marker.
This patch doesn't need huge zeropage being exist.
Yes, and for that reason I think it may perform worse than what we already
have in some cases. Instead of populating a single PMD you'll have to fill a
full PTE table.
Yes. If you think that'll worth it, I can conditionally do pmd zero thp in
a new version. Maybe it will be a good intermediate step between
introducing pte markers to pmd/pud/etc, so at least we don't need other
changes to coordinate pte markers to higher levels.
Especially when uffd-wp'ing large ranges that are possibly all unpopulated
(thinking about the existing VM background snapshot use case either with
untouched memory or with things like free page reporting), we might neither
be reading or writing that memory any time soon.
Right, I think that's a trade-off. But I still think large portion of
totally unpopulated memory should be rare case rather than majority, or am
I wrong? Not to mention that requires a more involved changeset to the
kernel.
So what I proposed here is the (AFAIU) simplest solution towards providing
such a feature in a complete form. I think we have chance to implement it
in other ways like pte markers, but that's something we can work upon, and
so far I'm not sure how much benefit we can get out of it yet.
What you propose here can already be achieved by user space fairly easily
(in fact, QEMU implementation could be further sped up using
MADV_POPULATE_READ). Usually, we only do that when there are very good
reasons to (performance).
Yes POPULATE_READ will be faster. This patch should make it even faster,
because it merges the two walks.
Using PTE markers would provide a real advantage IMHO for some users (IMHO
background snapshots), where we might want to avoid populating
zeropages/page tables as best as we can completely if the VM memory is
mostly untouched.
Naturally, I wonder if UFFD_FEATURE_WP_ZEROPAGE is really worth it. Is there
is another good reason to combine the populate zeropage+wp that I am missing
(e.g., atomicity by doing both in one operation)?
It also makes the new WP_ASYNC and pagemap interface clean: we don't want
to have user pre-fault it every time too as a common tactic.. It's hard to
use, and the user doesn't need to know the internals of why it is needed,
either.
The other thing is it provides a way to make anon and !anon behave the same
on empty ptes; it's a pity that it was not already like that.
We can always optimize this behavior in the future with either
PMD/PUD/.. pte markers as you said, but IMHO that just needs further
justification on the complexity, and also on whether that's beneficial to
the majority to become the default behavior.
The worst case (if anyone would like that behavior) is we can have another
feature bit making decision of that behavior, but that'll be something on
top.