Re: [PATCH 3/5] vhost: support upto 509 memory regions

From: Igor Mammedov
Date: Wed Jun 17 2015 - 07:48:21 EST


On Wed, 17 Jun 2015 12:46:09 +0200
"Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:

> On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 12:11:09 +0200
> > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >
> > > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > > > On Wed, 17 Jun 2015 09:39:06 +0200
> > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > >
> > > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > > > since commit
> > > > > > > > > > 1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > > >
> > > > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > > > registering hotplugged memory.
> > > > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > > > memory regions.
> > > > > > > > > >
> > > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > > >
> > > > > > > > > It was 64, not 65.
> > > > > > > > >
> > > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>
> > > > > > > > >
> > > > > > > > > Still thinking about this: can you reorder this to
> > > > > > > > > be the last patch in the series please?
> > > > > > > > sure
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Also - 509?
> > > > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > > > KVM's allotment of memory slots for userspace side.
> > > > > > >
> > > > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > > > why we need to match this exactly.
> > > > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > > > at 256+[initial fragmented memory]
> > > > >
> > > > > But what's the point? We allocate 32 bytes per slot.
> > > > > 300*32 = 9600 which is more than 8K, so we are doing
> > > > > an order-3 allocation anyway.
> > > > > If we could cap it at 8K (256 slots) that would make sense
> > > > > since we could avoid wasting vmalloc space.
> > > > 256 is amount of hotpluggable slots and there is no way
> > > > to predict how initial memory would be fragmented
> > > > (i.e. amount of slots it would take), if we guess wrong
> > > > we are back to square one with crashing userspace.
> > > > So I'd stay consistent with KVM's limit 509 since
> > > > it's only limit, i.e. not actual amount of allocated slots.
> > > >
> > > > > I'm still not very happy with the whole approach,
> > > > > giving userspace ability allocate 4 whole pages
> > > > > of kernel memory like this.
> > > > I'm working in parallel so that userspace won't take so
> > > > many slots but it won't prevent its current versions
> > > > crashing due to kernel limitation.
> > >
> > > Right but at least it's not a regression. If we promise userspace to
> > > support a ton of regions, we can't take it back later, and I'm concerned
> > > about the memory usage.
> > >
> > > I think it's already safe to merge the binary lookup patches, and maybe
> > > cache and vmalloc, so that the remaining patch will be small.
> > it isn't regression with switching to binary search and increasing
> > slots to 509 either performance wise it's more on improvment side.
> > And I was thinking about memory usage as well, that's why I've dropped
> > faster radix tree in favor of more compact array, can't do better
> > on kernel side of fix.
> >
> > Yes we will give userspace to ability to use more slots/and lock up
> > more memory if it's not able to consolidate memory regions but
> > that leaves an option for user to run guest with vhost performance
> > vs crashing it at runtime.
>
> Crashing is entirely QEMU's own doing in not handling
> the error gracefully.
and that's hard to fix (handle error gracefully) the way it's implemented now.

> >
> > userspace/targets that could consolidate memory regions should
> > do so and I'm working on that as well but that doesn't mean
> > that users shouldn't have a choice.
>
> It's a fairly unusual corner case, I'm not yet
> convinced we need to quickly add support to it when just waiting a bit
> longer will get us an equivalent (or even more efficient) fix in
> userspace.
with memory hotplug support in QEMU has been released for quite
a long time already and there is users that use it so fix in
the future QEMU won't make it work with their distros.
So I wouldn't say that is "fairly unusual corner case".

>
> > So far it's kernel limitation and this patch fixes crashes
> > that users see now, with the rest of patches enabling performance
> > not to regress.
>
> When I say regression I refer to an option to limit the array
> size again after userspace started using the larger size.
Is there a need to do so?

Userspace that cares about memory footprint won't use many slots
keeping it low and user space that can't do without many slots
or doesn't care will have bigger memory footprint.

>
>
> > >
> > > >
> > > > > > > > > I think if we are changing this, it'd be nice to
> > > > > > > > > create a way for userspace to discover the support
> > > > > > > > > and the # of regions supported.
> > > > > > > > That was my first idea before extending KVM's memslots
> > > > > > > > to teach kernel to tell qemu this number so that QEMU
> > > > > > > > at least would be able to check if new memory slot could
> > > > > > > > be added but I was redirected to a more simple solution
> > > > > > > > of just extending vs everdoing things.
> > > > > > > > Currently QEMU supports upto ~250 memslots so 509
> > > > > > > > is about twice high we need it so it should work for near
> > > > > > > > future
> > > > > > >
> > > > > > > Yes but old kernels are still around. Would be nice if you
> > > > > > > can detect them.
> > > > > > >
> > > > > > > > but eventually we might still teach kernel and QEMU
> > > > > > > > to make things more robust.
> > > > > > >
> > > > > > > A new ioctl would be easy to add, I think it's a good
> > > > > > > idea generally.
> > > > > > I can try to do something like this on top of this series.
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > drivers/vhost/vhost.c | 2 +-
> > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > > index 99931a0..6a18c92 100644
> > > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > > @@ -30,7 +30,7 @@
> > > > > > > > > > #include "vhost.h"
> > > > > > > > > >
> > > > > > > > > > enum {
> > > > > > > > > > - VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > > > > > > + VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > > > > > > > VHOST_MEMORY_F_LOG = 0x1,
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 1.8.3.1
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/