Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services

From: Wen Yang
Date: Thu Aug 06 2020 - 05:09:10 EST




在 2020/8/5 上午6:58, David Miller 写道:
From: Wen Yang <wenyang@xxxxxxxxxxxxxxxxx>
Date: Sat, 1 Aug 2020 16:58:45 +0800

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 75431ca..6b9d44b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
return true;
+ if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+ return true;
+
return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
}

You're bypassing explicitly the logic here:

/*
* Limit the number of linkwatch events to one
* per second so that a runaway driver does not
* cause a storm of messages on the netlink
* socket. This limit does not apply to up events
* while the device qdisc is down.
*/
if (!urgent_only)
linkwatch_nextevent = jiffies + HZ;
/* Limit wrap-around effect on delay. */
else if (time_after(linkwatch_nextevent, jiffies + HZ))
linkwatch_nextevent = jiffies;

Something about this isn't right. We need to analyze what you are seeing,
what device you are using, and what systemd is doing to figure out what
the right place for the fix.

Thank you.


Thank you very much for your comments.
We are using virtio_net and the environment is a microvm similar to firecracker.

Let's briefly explain.
net_device->operstate is assigned through linkwatch_event, and the call stack is as follows:
process_one_work
-> linkwatch_event
-> __linkwatch_run_queue
-> linkwatch_do_dev
-> rfc2863_policy
-> default_operstate

During the machine startup process, net_device->operstate has the following two-step state changes:

STEP A: virtnet_probe detects the network card and triggers the execution of linkwatch_fire_event.
Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
And since net_device->state is 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER), net_device->operstate will be changed from IF_OPER_UNKNOWN to IF_OPER_DOWN:
eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)

virtnet_probe then executes netif_carrier_on to update net_device->state, it will be changed from ‘__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2 (__LINK_STATE_PRESENT)

STEP B: One second later (because linkwatch_nextevent = jiffies + HZ), linkwatch_work is executed again.
At this time, since net_device->state is __LINK_STATE_PRESENT, so the net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)


The above state change can be completed within 2 seconds.
Generally, the machine will load the initramfs first, and do some initialization in the initramfs, which takes some time; then switch_root to the system disk and continue the initialization, which will also take some time, and finally start the systemd-networkd service, bringing link, etc.,
In this way, the linkwatch_work work queue has enough time to run twice, and the state of net_device->operstate is already IF_OPER_UP,
So bringing link up quickly returns the following information:
Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: eth0: bringing link up
...
Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: eth0: flags change: +UP +LOWER_UP +RUNNING

But we are now using MicroVM, which requires extreme speed to start, bypassing the initramfs and directly booting the trimmed system on the disk.
systemd-networkd starts in less than 1 second after booting. the STEP B has not been run yet, so it will wait for several hundred milliseconds here, as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP +LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING


Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
https://www.kernel.org/doc/Documentation/networking/operstates.txt

A routing daemon or dhcp client just needs to care for IFF_RUNNING or
waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
considering the interface / querying a DHCP address.

Finally, the STEP B above only updates the value of operstate based on the known state (operstate/state) on the net_device, without any hardware interaction involved, so it is not very reasonable to wait for 1 second there.

By adding:
+ if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+ return true;
+
We hope to improve the linkwatch_urgent_event function a bit.

Hope to get more of your advice and guidance.

Best wishes,
Wen