Re: [RFC PATCH 0/6] kvm: Growable memory slot array

From: Gleb Natapov
Date: Tue Dec 04 2012 - 11:42:34 EST


On Tue, Dec 04, 2012 at 08:39:47AM -0700, Alex Williamson wrote:
> On Tue, 2012-12-04 at 17:30 +0200, Gleb Natapov wrote:
> > On Tue, Dec 04, 2012 at 08:21:55AM -0700, Alex Williamson wrote:
> > > On Tue, 2012-12-04 at 13:48 +0200, Gleb Natapov wrote:
> > > > On Mon, Dec 03, 2012 at 04:39:05PM -0700, Alex Williamson wrote:
> > > > > Memory slots are currently a fixed resource with a relatively small
> > > > > limit. When using PCI device assignment in a qemu guest it's fairly
> > > > > easy to exhaust the number of available slots. I posted patches
> > > > > exploring growing the number of memory slots a while ago, but it was
> > > > > prior to caching memory slot array misses and thefore had potentially
> > > > > poor performance. Now that we do that, Avi seemed receptive to
> > > > > increasing the memory slot array to arbitrary lengths. I think we
> > > > > still don't want to impose unnecessary kernel memory consumptions on
> > > > > guests not making use of this, so I present again a growable memory
> > > > > slot array.
> > > > >
> > > > > A couple notes/questions; in the previous version we had a
> > > > > kvm_arch_flush_shadow() call when we increased the number of slots.
> > > > > I'm not sure if this is still necessary. I had also made the x86
> > > > > specific slot_bitmap dynamically grow as well and switch between a
> > > > > direct bitmap and indirect pointer to a bitmap. That may have
> > > > > contributed to needing the flush. I haven't done that yet here
> > > > > because it seems like an unnecessary complication if we have a max
> > > > > on the order of 512 or 1024 entries. A bit per slot isn't a lot of
> > > > > overhead. If we want to go more, maybe we should make it switch.
> > > > > That leads to the final question, we need an upper bound since this
> > > > > does allow consumption of extra kernel memory, what should it be? A
> > > > This is the most important question :) If we want to have 1000s of
> > > > them or 100 is enough?
> > >
> > > We can certainly hit respectable numbers of assigned devices in the
> > > hundreds. Worst case is 8 slots per assigned device, typical case is 4
> > > or less. So 512 slots would more or less guarantee 64 devices (we do
> > > need some slots for actual memory), and more typically allow at least
> > > 128 devices. Philosophically, supporting a full PCI bus, 256 functions,
> > > 2048 slots, is an attractive target, but it's probably no practical.
> > >
> > > I think on x86 a slot is 72 bytes w/ alignment padding, so a maximum of
> > > 36k @512 slots.
> > >
> > > > Also what about changing kvm_memslots->memslots[]
> > > > array to be "struct kvm_memory_slot *memslots[KVM_MEM_SLOTS_NUM]"? It
> > > > will save us good amount of memory for unused slots.
> > >
> > > I'm not following where that results in memory savings. Can you
> > > clarify. Thanks,
> > >
> > We will waste sizeof(void*) for each unused slot instead of
> > sizeof(struct kvm_memory_slot).
>
> Ah, of course. That means for 512 slots we're wasting a full page just
> for the pointers, whereas we can fit 56 slots in the same space. Given
> that most users get by just fine w/ 32 slots, I don't think that's a win
> in the typical case. Maybe if we want to support sparse arrays, but a
You can look at it differently :). We can increase number of slots to 288
with the same memory footprint we have now. And 288 looks like a lot of
slots.

Memory is cheap and getting cheaper and complicated code stays
complicated. Of course we should not go crazy with wisting memory, but
not go crazy to save each byte either.

> tree would probably be better at that point. A drawback of the growable
> array is that userspace can subvert any savings by using slot N-1 first,
> but that's why we put a limit at a reasonable size. Thanks,
>
Why using slot as an index? May be changing id_to_index[] into hash
table and use that to map from slot id to array index?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/