Re: [PATCH] virtio_net: enable tx after resuming from suspend

From: ake
Date: Tue Oct 16 2018 - 06:15:35 EST




On 2018å10æ16æ 17:53, Jason Wang wrote:
>
> On 2018/10/15 äå6:08, ake wrote:
>>
>> On 2018å10æ12æ 18:18, ake wrote:
>>>
>>> On 2018å10æ12æ 17:23, Jason Wang wrote:
>>>>
>>>> On 2018å10æ12æ 12:30, ake wrote:
>>>>> On 2018å10æ11æ 22:06, Jason Wang wrote:
>>>>>> On 2018å10æ11æ 18:22, ake wrote:
>>>>>>> On 2018å10æ11æ 18:44, Jason Wang wrote:
>>>>>>>> On 2018å10æ11æ 15:51, Ake Koomsin wrote:
>>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>>> reset")
>>>>>>>>> disabled the virtio tx before going to suspend to avoid a use
>>>>>>>>> after
>>>>>>>>> free.
>>>>>>>>> However, after resuming, it causes the virtio_net device to
>>>>>>>>> lose its
>>>>>>>>> network connectivity.
>>>>>>>>>
>>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>>
>>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine
>>>>>>>>> during
>>>>>>>>> reset")
>>>>>>>>> Signed-off-by: Ake Koomsin <ake@xxxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>> ÂÂÂÂ drivers/net/virtio_net.c | 1 +
>>>>>>>>> ÂÂÂÂ 1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>>> virtio_device *vdev)
>>>>>>>>> ÂÂÂÂÂÂÂÂ }
>>>>>>>>> ÂÂÂÂ ÂÂÂÂÂ netif_device_attach(vi->dev);
>>>>>>>>> +ÂÂÂ netif_start_queue(vi->dev);
>>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>>> netif_device_attach() above?
>>>>>>> Thank you for your review.
>>>>>>>
>>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>>> Yes, maybe. One case I can see now is when the device is down, in
>>>>>> this
>>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>>
>>>>>>> ÂÂÂ Without
>>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>>> after waking up.
>>>>>> How do you trigger the issue? Just do suspend/resume?
>>>>> Yes, simply suspend and resume.
>>>>>
>>>>> Here is how I trigger the issue:
>>>>>
>>>>> 1) Start the Virtual Machine Manager GUI program.
>>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>> ÂÂÂÂ >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>> ÂÂÂÂ In addition, make sure that the video adapter is VGA. Otherwise,
>>>>> ÂÂÂÂ waking up with the virtual power button does not work.
>>>>> 3) After installing the guest OS, log in, and test the network
>>>>> ÂÂÂÂ connectivity by ping the host machine.
>>>>> 4) Suspend. After this, the screen is blank.
>>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>> ÂÂÂÂ appears again.
>>>>> 6) Log in again. The guest loses its network connection.
>>>>>
>>>>> In my test:
>>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>>> I can not reproduce this issue if virtio-net interface is up in guest
>>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>>> reproduce when virtio-net interface is down in guest before suspend,
>>>> after resume, even if I make it up, the network is still lost.
>>>>
>>>> I think the interface is up in your case, but please confirm this.
>>> If you mean the interface state before I hit the suspend button,
>>> the answer is yes. The interface is up before I suspend the guest
>>> machine.
>>>
>>> Note that my current QEMU version is QEMU emulator version 2.5.0
>>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>>
>>> I will try with net-next.git and qemu master later and see if I can
>>> reproduce the issue.
>> Update. I tried with net-next and qemu master. Interestingly, the result
>> is different from yours. The network is lost even if the virtio_net
>> interface is up before suspending.
>>
>> Host: Ubuntu 16.04 with net-next kernel (default configuration)
>> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
>> Qemu: master
>> Qemu command:
>> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
>> -bios /usr/share/OVMF/OVMF_CODE.fd \
>> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
>> -netdev user,id=hostnet0 \
>> -device virtio-net-pci,netdev=hostnet0 \
>> -device VGA,id=video0,vgamem_mb=16 \
>> -global PIIX4_PM.disable_s3=1 \
>> -global PIIX4_PM.disable_s4=1 -monitor stdio
>
>
> Interesting, just notice you're using userspace network. To isolate the
> issue, can you retry with e.g tap or e1000 to make sure it's not a fault
> of slirp or virito-net?

I will try.

> Thanks
>

There is another thing that I want to discuss. I notice that
netif_device_detach() should result in setting __QUEUE_STATE_DRV_XOFF if
the network interface is running. By calling netif_tx_disable() after
netif_device_detach(), isn't it redundant in case of the network
interface is running? If the goal is to serialize tx routine, would
netif_tx_lock() and net_tx_unlock() are more appropriate? Like this:

netif_tx_lock(vi->dev);
netif_device_detach(vi->dev);
netif_tx_unlock(vi->dev);

Currently, netif_tx_disable() seems to disturb the symmetry of
netif_device_detach() and netif_device_attach(). That is the reason
why you can reproduce the problem when the interface is down before
suspending.