Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
From: Boris Ostrovsky
Date: Thu Mar 14 2019 - 14:16:39 EST
On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
> On 3/14/19 17:40, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>> ÂÂÂÂ be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>> ÂÂÂÂ free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>> ÂÂÂÂ means with the backned were destroyed by the frontend, thus
>>>>>>> ÂÂÂÂ making the frontend appear to the guest OS as functional, but
>>>>>>> ÂÂÂÂ not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>> My tree is about a month old so perhaps there is some sort of
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters
>>> suspend
>>> and what you do to resume it?
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> Hm, so it does work for your use-case, but doesn't for mine.
>
> What would be the best way forward?
>
> 1. Implement .resume properly as, for example, block front does [1]
>
> 2. Remove .resume completely: this does work as long as backend
> doesn't change anything
For save/restore (migration) there is no guarantee that the new backend
has the same set of features.
>
> I am still a bit unsure if we really need to re-initialize rings,
> re-read front's config from
>
> Xenstore etc - what changes on backend side are expected when we
> resume the front driver?
Number of queues, for example. Or things in xennet_fix_features().
-boris
>
>>
>>
>> -boris
>
> Thank you,
>
> Oleksandr
>
>
> [1]
> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>