Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open

From: Ross Lagerwall
Date: Thu Jan 11 2018 - 10:49:18 EST


On 01/11/2018 03:26 PM, David Miller wrote:
From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
Date: Thu, 11 Jan 2018 09:36:38 +0000

When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).

Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.

Signed-off-by: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>

Where is patch 1/2 and the 0/2 header posting which explains what this
patch series is doing, how it is doing it, and why it is doing it that
way?


I've now added CC'd netdev on the other two.

Cheers,
--
Ross Lagerwall