Re: BUG due to "xen-netback: protect resource cleaning on XenBus disconnect"

From: Boris Ostrovsky
Date: Thu Mar 02 2017 - 13:01:05 EST


On 03/02/2017 06:56 AM, Juergen Gross wrote:
> With commits f16f1df65 and 9a6cdf52b we get in our Xen testing:
>
> [ 174.512861] switch: port 2(vif3.0) entered disabled state
> [ 174.522735] BUG: sleeping function called from invalid context at
> /home/build/linux-linus/mm/vmalloc.c:1441
> [ 174.523451] in_atomic(): 1, irqs_disabled(): 0, pid: 28, name: xenwatch
> [ 174.524131] CPU: 1 PID: 28 Comm: xenwatch Tainted: G W
> 4.10.0upstream-11073-g4977ab6-dirty #1
> [ 174.524819] Hardware name: MSI MS-7680/H61M-P23 (MS-7680), BIOS V17.0
> 03/14/2011
> [ 174.525517] Call Trace:
> [ 174.526217] show_stack+0x23/0x60
> [ 174.526899] dump_stack+0x5b/0x88
> [ 174.527562] ___might_sleep+0xde/0x130
> [ 174.528208] __might_sleep+0x35/0xa0
> [ 174.528840] ? _raw_spin_unlock_irqrestore+0x13/0x20
> [ 174.529463] ? __wake_up+0x40/0x50
> [ 174.530089] remove_vm_area+0x20/0x90
> [ 174.530724] __vunmap+0x1d/0xc0
> [ 174.531346] ? delete_object_full+0x13/0x20
> [ 174.531973] vfree+0x40/0x80
> [ 174.532594] set_backend_state+0x18a/0xa90
> [ 174.533221] ? dwc_scan_descriptors+0x24d/0x430
> [ 174.533850] ? kfree+0x5b/0xc0
> [ 174.534476] ? xenbus_read+0x3d/0x50
> [ 174.535101] ? xenbus_read+0x3d/0x50
> [ 174.535718] ? xenbus_gather+0x31/0x90
> [ 174.536332] ? ___might_sleep+0xf6/0x130
> [ 174.536945] frontend_changed+0x6b/0xd0
> [ 174.537565] xenbus_otherend_changed+0x7d/0x80
> [ 174.538185] frontend_changed+0x12/0x20
> [ 174.538803] xenwatch_thread+0x74/0x110
> [ 174.539417] ? woken_wake_function+0x20/0x20
> [ 174.540049] kthread+0xe5/0x120
> [ 174.540663] ? xenbus_printf+0x50/0x50
> [ 174.541278] ? __kthread_init_worker+0x40/0x40
> [ 174.541898] ret_from_fork+0x21/0x2c
> [ 174.548635] switch: port 2(vif3.0) entered disabled state
>
> I believe calling vfree() when holding a spin_lock isn't a good idea.
>
> Boris, this is the dumpdata failure:
> FAILURE 4.10.0upstream-11073-g4977ab6-dirty(x86_64)
> 4.10.0upstream-11073-g4977ab6-dirty(i386)\: 2017-03-02 (tst007)


That's not the cause of the test failure though --- it's "just" a warning.

The problem here was that 64- and 32-bit build trees got out of sync
(which is my fault, I switched the former to staging but forgot to do
the same for the latter). We have in the log:

libxl: error: libxl_create.c:564:libxl__domain_make: domain creation
fail: Operation not supported
libxl: error: libxl_create.c:931:initiate_domain_create: cannot make
domain: -3

I now have both trees use staging.

-boris