Re: [PATCH 3/5] vhost: support upto 509 memory regions
From: Michael S. Tsirkin
Date: Wed Jun 17 2015 - 06:46:26 EST
On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 12:11:09 +0200
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>
> > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > > On Wed, 17 Jun 2015 09:39:06 +0200
> > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > >
> > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > >
> > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > > since commit
> > > > > > > > > 1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > >
> > > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > > registering hotplugged memory.
> > > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > > memory regions.
> > > > > > > > >
> > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > >
> > > > > > > > It was 64, not 65.
> > > > > > > >
> > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>
> > > > > > > >
> > > > > > > > Still thinking about this: can you reorder this to
> > > > > > > > be the last patch in the series please?
> > > > > > > sure
> > > > > > >
> > > > > > > >
> > > > > > > > Also - 509?
> > > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > > KVM's allotment of memory slots for userspace side.
> > > > > >
> > > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > > why we need to match this exactly.
> > > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > > at 256+[initial fragmented memory]
> > > >
> > > > But what's the point? We allocate 32 bytes per slot.
> > > > 300*32 = 9600 which is more than 8K, so we are doing
> > > > an order-3 allocation anyway.
> > > > If we could cap it at 8K (256 slots) that would make sense
> > > > since we could avoid wasting vmalloc space.
> > > 256 is amount of hotpluggable slots and there is no way
> > > to predict how initial memory would be fragmented
> > > (i.e. amount of slots it would take), if we guess wrong
> > > we are back to square one with crashing userspace.
> > > So I'd stay consistent with KVM's limit 509 since
> > > it's only limit, i.e. not actual amount of allocated slots.
> > >
> > > > I'm still not very happy with the whole approach,
> > > > giving userspace ability allocate 4 whole pages
> > > > of kernel memory like this.
> > > I'm working in parallel so that userspace won't take so
> > > many slots but it won't prevent its current versions
> > > crashing due to kernel limitation.
> >
> > Right but at least it's not a regression. If we promise userspace to
> > support a ton of regions, we can't take it back later, and I'm concerned
> > about the memory usage.
> >
> > I think it's already safe to merge the binary lookup patches, and maybe
> > cache and vmalloc, so that the remaining patch will be small.
> it isn't regression with switching to binary search and increasing
> slots to 509 either performance wise it's more on improvment side.
> And I was thinking about memory usage as well, that's why I've dropped
> faster radix tree in favor of more compact array, can't do better
> on kernel side of fix.
>
> Yes we will give userspace to ability to use more slots/and lock up
> more memory if it's not able to consolidate memory regions but
> that leaves an option for user to run guest with vhost performance
> vs crashing it at runtime.
Crashing is entirely QEMU's own doing in not handling
the error gracefully.
>
> userspace/targets that could consolidate memory regions should
> do so and I'm working on that as well but that doesn't mean
> that users shouldn't have a choice.
It's a fairly unusual corner case, I'm not yet
convinced we need to quickly add support to it when just waiting a bit
longer will get us an equivalent (or even more efficient) fix in
userspace.
> So far it's kernel limitation and this patch fixes crashes
> that users see now, with the rest of patches enabling performance
> not to regress.
When I say regression I refer to an option to limit the array
size again after userspace started using the larger size.
> >
> > >
> > > > > > > > I think if we are changing this, it'd be nice to
> > > > > > > > create a way for userspace to discover the support
> > > > > > > > and the # of regions supported.
> > > > > > > That was my first idea before extending KVM's memslots
> > > > > > > to teach kernel to tell qemu this number so that QEMU
> > > > > > > at least would be able to check if new memory slot could
> > > > > > > be added but I was redirected to a more simple solution
> > > > > > > of just extending vs everdoing things.
> > > > > > > Currently QEMU supports upto ~250 memslots so 509
> > > > > > > is about twice high we need it so it should work for near
> > > > > > > future
> > > > > >
> > > > > > Yes but old kernels are still around. Would be nice if you
> > > > > > can detect them.
> > > > > >
> > > > > > > but eventually we might still teach kernel and QEMU
> > > > > > > to make things more robust.
> > > > > >
> > > > > > A new ioctl would be easy to add, I think it's a good
> > > > > > idea generally.
> > > > > I can try to do something like this on top of this series.
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > > drivers/vhost/vhost.c | 2 +-
> > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > index 99931a0..6a18c92 100644
> > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > @@ -30,7 +30,7 @@
> > > > > > > > > #include "vhost.h"
> > > > > > > > >
> > > > > > > > > enum {
> > > > > > > > > - VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > > > > > + VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > > > > > > VHOST_MEMORY_F_LOG = 0x1,
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 1.8.3.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/