On Thu, Feb 16, 2023 at 07:23:17PM +0100, David Hildenbrand wrote:
On 16.02.23 18:55, Peter Xu wrote:
On Thu, Feb 16, 2023 at 06:00:51PM +0100, David Hildenbrand wrote:
There are various reasons why I think a UFFD_FEATURE_WP_UNPOPULATED, using
PTE markers, would be more benficial:
1) It would be applicable to anon hugetlb
Anon hugetlb should already work with non ptes with the markers?
... really? I thought we'd do the whole pte marker handling only when
dealing with hugetlb/shmem. Interesting, thanks. (we could skip population
in QEMU in that case as well -- we always do it for now)
Hmm, you're talking about "anon hugetlb", so it's still hugetlb, right? :)
I mean especially MAP_PRIVATE|MAP_HUGETLB|MAP_ANONYMOUS, so "in theory"
without any fd and thus pagecache. ... but anon hugetlb keeps confusing me
with pagecache handling.
IIUC when mmap(fd==-1) it's the same as MAP_PRIVATE|MAP_HUGETLB.
The focus of that paper is on CoW latency yes (and deduplication
instantiating shared zeropages -- but building a covert channel using CoW
latency might be rather tricky I think, because they will get deduplciated
independently of a sender action ...).
However, in theory, one could build a covert channel between two VMs simply
by using cache flushes and reading from the shared zeropage. Measuring
access time can reveal if the sender read the page (L3 filled) or not (L3
not filled).
So the attacker will know when someone reads a zeropage, but I still don't
get how that can leads to data leak..
Having that said, I don't think that we are going to disable the shared
zeropage because of that for some workloads, I assume in most cases it will
simply be way too noisy to transmit any kind of data and we have more
critical covert channels to sort out if we want to.
Just wanted to raise it because you asked :)
Another note for s390: when it comes we can consider moving to pte markers
conditionally when !zeropage. But we can leave that for later.
Sure, we could always have another feature flag.
I think that doesn't need to be another feature flag. If someone will port
uffd-wp to s390 we can implement pte markers for WP_ZEROPAGE, then we
either use it when zeropage not exist, or we can switch to pte markers
completely too without changing the interface if we want, depending on
whether we think replacing zeropages with pte markers will be a major issue
with existing apps. I don't worry too much on that part.
Using PTE markers would provide a real advantage IMHO for some users (IMHO
background snapshots), where we might want to avoid populating
zeropages/page tables as best as we can completely if the VM memory is
mostly untouched.
Naturally, I wonder if UFFD_FEATURE_WP_ZEROPAGE is really worth it. Is there
is another good reason to combine the populate zeropage+wp that I am missing
(e.g., atomicity by doing both in one operation)?
It also makes the new WP_ASYNC and pagemap interface clean: we don't want
to have user pre-fault it every time too as a common tactic.. It's hard to
use, and the user doesn't need to know the internals of why it is needed,
either.
I feel like we're building a lot of infrastructure on uffd-wp instead of
having an alternative softdirty mode (using a world switch?) that works as
expected and doesn't require that many uffd-wp extensions. ;)
We used to discuss this WP_ZEROPAGE before, and I thought we were all happy
to have that. Obviously you changed your mind. :)
I wasn't really eager on this before because the workaround of pre-read
works good already (I assume slightly slower but it's fine; not until
someone starts to worry). But if we want to extend soft-dirty that's not
good at all to have any new user being requested to prefault memory and
figuring out why it's needed.
Having that said, I have the feeling that you and Muhammad have a plan to
make it work using uffd-wp and I won't interfere. It would be nicer to use
softdirty infrastructure IMHO, though.
Thanks. If you have any good idea on reusing soft-dirty, please shoot.
I'll be perfectly happy with it as long as it resolves the issue for
Muhammad. Trust me - I wished the soft dirty thing worked out, but
unfortunately it didn't.. Because at least so far uffd-wp has two major
issues as I can see:
(1) Memory type limitations (e.g. general fs memories stop working)
(2) Tracing uffd application is, afaict, impossible
So if there's better way to do with soft-dirty or anything else (and I
assume it'll not be limited to any of above) it's time to say..
The other thing is it provides a way to make anon and !anon behave the same
on empty ptes; it's a pity that it was not already like that.
In an ideal world, we'd simply be using PTE markers unconditionally I think
and avoid this zeropage feature :/
Is there any particular reason to have UFFD_FEATURE_WP_ZEROPAGE and not
simply always do that unconditionally? (sure, we have to indicate to user
space that it now works as expected) Are we really expecting to break user
space by protecting what was asked for to protect?
I suspect so.
From high level, the major functional changes will be:
(1) The user will start to receive more WP message with zero page being
reported,
(2) Wr-protecting a very sparse memory can be much slower
I would expect there're cases where the app just works as usual.
However in some other cases the user may really not care about zero pages
at all, and I had a feeling that's actually the majority.
Live snapshot is actually special because IIUC the old semantics should
work perfectly if the guest OS won't try to sanity check freed pages being
all zeros.. IOW that's some corner case, and if we can control that we may
not even need WP_ZEROPAGE too for QEMU, iiuc. For many other apps people
may leverage this (ignoring mem holes) and make the app faster.
Normally when I'm not confident of any functional change, I'd rather use a
flag. Luckily uffd is very friendly to that, so the user can have better
control of what to expect. Some future app may explicitly want to always
ignore zero pages when on extremely sparse mem, and without the flag it
can't choose.
We can always optimize this behavior in the future with either
PMD/PUD/.. pte markers as you said, but IMHO that just needs further
justification on the complexity, and also on whether that's beneficial to
the majority to become the default behavior.
As I said, usually any new features require good justification. Maybe there
really is a measurable performance gain (less syscalls, less pgtable walks).
Muhammad may have a word to say here; let's see whether he has any comment.
Besides that, as I replied above I'll collect some data in my next post
regardless, with an attempt to optimize with huge zeropages on top.