lightweight netdevs Jan 2018 edition [Re: [PATCH net-next v2] net: core: Expose number of link up/down transitions]

From: David Ahern
Date: Mon Jan 22 2018 - 17:16:33 EST


On 1/22/18 1:46 PM, Florian Fainelli wrote:
>>
>> Like David Ahern I am strongly against the proliferation of sysfs files
>> attached to network devices and the per-netdevice costs associated with
>> that.
>>
>> However, dealing with that is a longer term issue that nobody has a clear
>> plan for. Therefore I cannot reject this change on that basis alone.
>>
>> The information is useful, so applied, thanks.
>
> Thanks! David A, do you have any plans to revive your LWD/LWT devices
> patches, AFAIR you were allowing a knob disabling the creation of sysfs
> attributes.
>

At the moment (and for the next few months) I am focusing on route
scalability:
http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf

I do think about the lightweight netdevice need from time to time. If
anyone has the time and wants to pick it up, the patches are on github:

https://github.com/dsahern/linux net/lwt-dev
https://github.com/dsahern/iproute2 lwt-dev

As a refresher on the concept, the intention is that users can *opt in*
to skipping the overhead of standard netdevs by adding a flag during the
link create.

Virtual devices such as vlan, macvlan, ipvlan, vrf, dummy are great
candidates for this flag, and potentially bonds and bridges if the
deployment use case is ok with what the lightweight moniker means which is:

1. no sysfs files for the device

2. no separate sysctl tree for the device (default settings are used)

3. delayed network protocol (IPv4, IPv6, MPLS) initializations

The last one dove tails with the need for L2 only devices and suggests a
the need for a separate control flag. If you dig further into the
protocol initializations one could easily justify flags for a finer
granularity on what is skipped (e.g., do protocol init, but skip netconf
(use default values) and skip snmp stats).

Slide 9 in
https://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
shows the memory allocations. 40+kB / netdev is a killer. Using the lwt
tag, that can be shrunk from ~44k to ~4k - a big gain. e.g., 4k VRFs
(yes, I have been asked about that) would go from ~160MB to just ~16MB.

As I recall the kernel patch is not complete, only shows the intent and
what the flag offers (pain is worth the gain). During the last
discussion on this the idea of a net_dev_common was suggested. After
looking into it I believe it is the wrong direction -- an unnecessary
churn on the code base when the only intention is to omit / bypass
existing code paths. The common would come in handy in trying to reduce
the size of 'struct net_device' which is driven by 'struct device' and
similar h/w entries.