I remember that there were BUG reports where we'd actually split and run
into that problem. Just don't have them at hand. I think they happened
during early boot when the OS re-configured some PCI thingies.
If you could point me where this is happening, it would be nice. So far
I could not find or see any split/merge operation.
That would probably require more KVM code overall, but each operation
would be more
tightly bounded and thus simpler to define. And I think more precise
APIs would
provide other benefits, e.g. growing a region wouldn't require first
deleting the
current region, and so could avoid zapping SPTEs and destroying
metadata. Merge,
split, and truncate would have similar benefits, though they might be
more
difficult to realize in practice.
So essentially grow would not require INVALIDATE. Makes sense, but would
it work also with shrink? I guess so, as the memslot is still present
(but shrinked) right?
Paolo, would you be ok with this smaller API? Probably just starting
with grow and shrink first.
I am not against any of the two approaches:
- my approach has the disadvantage that the list could be arbitrarily
long, and it is difficult to rollback the intermediate changes if
something breaks during the request processing (but could be simplified
by making kvm exit or crash).
- Sean approach could potentially provide more burden to the userspace,
as we need to figure which operation is which. Also from my
understanding split and merge won't be really straightforward to
implement, especially in userspace.
David, any concern from userspace prospective on this "CISC" approach?
In contrast to resizes in QEMU that only affect a single memory
region/slot, splitting/merging is harder to factor out and communicate
to a notifier. As an alternative, we could handle it in the commit stage
in the notifier itself, similar to what my prototype does, and figure
out what needs to be done in there and how to call the proper KVM
interface (and which interface to call).
With virtio-mem (in the future) we might see merges of 2 slots into a
single one, by closing a gap in-between them. In "theory" we could
combine some updates into a single transaction. But it won't be 100s ...
I think I'd prefer an API that doesn't require excessive ad-hoc
extensions later once something in QEMU changes.
I think in essence, all we care about is performing atomic changes that
*have to be atomic*, because something we add during a transaction
overlaps with something we remove during a transaction. Not necessarily
all updates in a transaction!
My prototype essentially detects that scenario, but doesn't call new KVM
interfaces to deal with these.
With "prototype" I assume you mean the patch linked above
(region_resize), not the userspace-only proposal you sent initially right?
If we implement single operations (split/merge/grow/shrink), we don't
I assume once we take that into consideration, we can mostly assume that
any such atomic updates (only of the regions that really have to be part
of an atomic update) won't involve more than a handful of memory
regions. We could add a sane KVM API limit.
And if we run into that API limit in QEMU, we can print a warning and do
it non-atomically.
even need that limit. Except for merge, maybe.
Ok, if it'ok for you all I can try to use David patch and implement some
simple grow/shrink. Then we need to figure where and when exactly QEMU
performs split and merge operations, and maybe implement something
similar to what you did in your proposal?