Re: [PATCH 08/11] KVM: implement multiple address spaces

From: Paolo Bonzini
Date: Tue May 19 2015 - 12:19:44 EST




On 19/05/2015 15:32, Radim KrÄmÃÅ wrote:
> I'd prefer to decouple address spaces and slots. KVM_GET_DIRTY_LOG
> could stay the same if we said that a slot can be in multiple address
> spaces.

Actually, my original plan was a third one. I wanted to have a fixed
number of address spaces, where CPUs would use only the first ones (0
for non-x86, 0+1 for x86). Then I would let userspace pick a "dirty log
address space" which, in the case of QEMU, would simply use ram_addr_t
as the address.

However, that doesn't work because when marking a page dirty you need
either a (slots, gfn) or a (memslot, relative address in memslot) pair.
Given the gfn that the VCPU has faulted on, it would be very expensive
to find the corresponding gfn in the "dirty log address space". On the
contrary, it's very easy if the VCPU can simply query its current memslots.

In your proposal, there is the problem of where to do the overlap check.
You cannot do it in the slots because it messes up userspace seriously.
QEMU for example wants to use as few slots as possible and merges
consecutive slots; but what is mergeable in one address space need not
be mergeable in another. And if you do it in the address spaces, you
have not solved the problem of multiple dirty bitmaps pointing for the
same userspace address.

BTW, a few more calls have to be converted to kvm_vcpu_ equivalents.
I've now gone through all occurrences of "gfn_to_" and found that we
have more invocations of gfn_to_page_many_atomic, gfn_to_pfn_atomic,
gfn_to_pfn, and gfn_to_page to convert. Also, mark_page_dirty must be
changed to kvm_vcpu_mark_page_dirty.

> (Well, we could do that even now, by searching for slots that
> differ only in address space id and pointing them to same dirty bitmap.
> This even makes sense, which is a sign of lacking design :)

I thought about this, but ultimately it sounded like premature
optimization (and also might not even work due to the aforementioned
problem with merging of adjacent slots).

A possible optimization is to set a flag when no bits are set in the
dirty bitmap, and skip the iteration. This is the common case for the
SMM memory space for example. But it can be done later, and is an
independent work.

Keeping slots separate for different address spaces also makes the most
sense in QEMU, because each address space has a separate MemoryListener
that only knows about one address space. There is one log_sync callback
per address space and no code has to know about all address spaces.

> The main drawback is that forcing decoupling on existing IOCTLs would be
> really awkward ... we'd need to have new API for address spaces;
> there are two basic operations on an address space:
> add and remove slot (needs: slot id, address space id)
> which would give roughly the same functionality as this patch.
> (To maximixe compatibility, there could be a default address space and a
> slot flag that doesn't automatical
> On top of this, we could allow hierarchical address spaces, so very similar
> address spaces (like SMM) would be easier to set up.

That would actually be really messy. :)

The regular and SMM address spaces are not hierarchical. As soon as you
put a PCI resource underneath SMRAM---which is exactly what happens for
legacy VRAM at 0xa0000---they can be completely different. Note that
QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256
color mode (this mapping is not correct from the VGA point of view, but
it cannot be changed in QEMU without breaking migration).

What I do dislike in the API is the 16-bit split of the slots field; but
the alternative of defining KVM_SET_USER_MEMORY_REGION2 and
KVM_GET_DIRTY_LOG2 ioctls is just as sad. If you prefer it that way, I
can certainly do that.

Is it okay for you if I separate the cleanup patches, and then repost
the actual SMM series after the API discussions have settled? The
cleanups should be easily reviewable, and they make some sense on their
own. I'm currently autotesting them.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/