Re: [PATCH 2/2] KVM: Scalable memslots implementation

From: Paolo Bonzini
Date: Wed Feb 03 2021 - 08:43:39 EST


On 02/02/21 23:42, Maciej S. Szmigiero wrote:
I'm not opposed to using more sophisticated storage for the gfn lookups, but only if there's a good reason for doing so. IMO, the
rbtree isn't simpler, just different.

And it also has worse cache utilization than an array, due to memory footprint (as you point out below) but also pointer chasing.

Memslot modifications are
unlikely to be a hot path (and if it is, x86's "zap everything"
implementation is a far bigger problem), and it's hard to beat the
memory footprint of a raw array. That doesn't leave much motivation for such a big change to some of KVM's scariest (for me)
code.


Improvements can be done step-by-step, kvm_mmu_invalidate_zap_pages_in_memslot() can be rewritten, too in
the future, if necessary. After all, complains are that this change
alone is too big.

I think that if you look not at the patch itself but at the
resulting code the new implementation looks rather straightforward,
there are comments at every step in kvm_set_memslot() to explain
exactly what is being done. Not only it already scales well, but it
is also flexible to accommodate further improvements or even new
operations.

The new implementation also uses standard kernel {rb,interval}-tree
and hash table implementation as its basic data structures, so it automatically benefits from any generic improvements to these.

All for the low price of just 174 net lines of code added.

I think the best thing to do here is to provide a patch series that splits the individual changes so that they can be reviewed and their separate merits evaluated.

Another thing that I dislike about KVM_SET_USER_MEMORY_REGION is that IMO userspace should provide all memslots at once, for an atomic switch of the whole memory array. (Or at least I would like to see the code; it might be a bit tricky because you'll need code to compute the difference between the old and new arrays and invoke kvm_arch_prepare/commit_memory_region). I'm not sure how that would interact with the active/inactive pair that you introduce here.

Paolo