This document is intended to help the porting of "legacy" network drivers to the new interface provided by the new softnet architecture. It does not delve into the details of softnet (that would be another article). Throughout the document "Toplevel" and "core" are interchange-ably used to refer to the network code above the driver. FLAG OBSOLETION --------------- Several device flags are obsoleted; their functionality is embedded in a different re-incarnation as will be explained. ** The obsoleted device flags are: dev->tbusy dev->start dev->interrupt The current replacements/new flags are stored in the bitmap dev->state Currently defined state bits/flags are: bit 0: LINK_STATE_XOFF bit 1: LINK_STATE_DOWN bit 3: LINK_STATE_START bit 4: LINK_STATE_RXSEM bit 5: LINK_STATE_TXSEM bit 6: LINK_STATE_SCHED LINK_STATE_XOFF is the replacement for dev->tbusy LINK_STATE_DOWN is the replacement for the !IFF_RUNNING check LINK_STATE_START is the replacement for dev->start LINK_STATE_RXSEM is the replacement for dev->interrupt LINK_STATE_TXSEM is a flag used only by fastrouting drivers This document will not go into the details of fastrouting other than to say it is a mechanism for direct NIC-to-NIC routing. If your driver was fasroute enabled in 2.2 (and i only know of two such drivers ;->) then this is equivalent to dev->tx_semaphore. Essentially, if your device does not know about fast routing then you need not to worry about this. LINK_STATE_SCHED is an internal flag to the Top Level (layer above the driver). The driver writer need not be concerned about LINK_STATE_SCHED but needs to understand the implications as will hopefully be conveyed by this document. Caution needs to be exercised as the replacement is not a direct one to one mapping, given the new mechanism being introduced by softnet to take advantage of SMP, there are intricacies involved. Hopefully after reading this two times, it should become clear;-> (OK, once for the experienced). NO MORE POLLING dev->tbusy -------------------------- ** Gone is the "polling" mechanism that involved checking by the core layer (above the driver) for the dev->tbusy setting; No more baby-sitting; the driver is now in charge of its own fate ;-> (even if one can still define what happens as polling). Some background explanation is necessary to understand the change in the core: For historical reasons it is worth noting that in 2.0 dev_queue_xmit() ignored dev->tbusy and a functionality similar to that of a watchdog was played by _new_ packets pushing to the driver for transmission. This way the driver could recover from transmission lockups etc. All the devices, including DOWNed ones were stored in a runqueue and scanned twice on each net_bh event. In 2.2 only the devices which have some packets in the queue or recently had something are added to the runqueue, which is still scanned on each net_bh event twice. If dev->tbusy is set, the packet is _not_ handed to the driver for transmission. As a backup, a watchdog timer monitors dev->tbusy and if it is stuck for too long a time, then a dev->hard_start_xmit() is forced. The 2.2 behavior is maintained in the current 2.3. With softnet, dev->hard_start_xmit() is no longer invoked by the watchdog timer. Instead a new interface dev->tx_timeout() is invoked. The replacement timer routine is registered into the dev structure by the driver. Most drivers already had a timer routine used to recover from transmission lockups. A simple re-use of such a routine with proper replacements of the old flag transitions would suffice in most cases. As a result of the requirement, there are two new entries in the net_dev structure: dev->tx_timeout which is the function pointer to the driver timeout routine and dev->watchdog_timeo which is the value that is used for the timeout. The attachment to the device structure for these entries is done around the same time that things like dev->hard_start_xmit are attached to the net_dev structure by the driver i.e normally in the dev_probe() routine The timer, however, is fired once the device is ifup'ed by the core (i.e it is not the responsibility of the driver author to add it to the timer list). As a quick guideline: code which used to do (in the mydev_hw_start_xmit() routine) something like this: --- if (test_and_set_bit(0, (void*)&dev->tbusy) != 0) { if (jiffies - dev->trans_start >= TX_TIMEOUT) dev_tx_timeout(dev); return 1; } ---- should now be ripped off and be replaced by (in the probe() routine) --- dev->tx_timeout = &my_tx_timeout_func; dev->watchdog_timeo = MY_DEV_TX_TIMEOUT; --- *** Also a new important change: It is up to the device to add itself to the runqueue i.e it is no longer the core's responsibilty. This is done via the netif_wake_queue() call by the driver. More on this below in the "GENERAL GUIDELINES" section. [Obviously this change in softnet not only reduces CPU utilization, but more importantly reduces the code complexity] GENERAL GUIDELINES ------------------ This is a general guideline on how you replace the flags. The timer should be added as indicated above. ************************************************************ 1) OPENING the device: ************************************************************ current: -------- This involves setting the dev->start flag and clearing the dev->tbusy and dev->interrupt New way: ------- invoke netif_start_queue(dev). At the moment, this clears the LINK_STATE_XOFF state bit. On return from the open() the core/Toplevel code sets the LINK_STATE_START bit. It is not advisable for you to clr/set_bit() on any of the above. Let the netif_start_queue() and the general Toplevel code take care of things; it is more portable this way, in case in the future some new functionality gets added to netif_start_queue() etc. **************************************************** 2) the dev->tbusy DURING DEVICE OPERATIONS i.e when the device is sending and receiving packets **************************************************** A) Clearing it: current: ======= code snippets from different drivers: -- clear_bit(0,(void *)&dev->tbusy) -- or in some drivers --- if (test_and_set_bit(0, (void*)&dev->tbusy) != 0) dev->tbusy = 0; --- etc new way: ======= invoke netif_wake_queue(dev) this takes the the device out of the LINK_STATE_XOFF state (meaning it is ready to receive packets from the upper layers) Note that it is extremely dangerous for you to clr_bit() directly on the LINK_STATE_XOFF. netif_wake_queue() does a lot more than just reset this bit. It schedules the device to be serviced Scheduling here also includes explicitly doing something equivalent to the mark_bh(NET_BH) which was done by some drivers to help kick some of the receive packet input processing as well as getting added to the device runqueue. B) setting it current: ======== -- test_and_set_bit(0,(void *)&dev->tbusy) --- etc new way: ======== call: netif_stop_queue(dev) this takes the the device into the LINK_STATE_XOFF state (which means it cant receive packets anymore from upper layers.) It is your responsibility to ensure that you go back to the non-XOFF state at some point later on; somehow somewhere you need to guarantee that the netif_wake_queue() is invoked. Your dev_tx_timeout() must either call netif_wake_queue() or restart its transmitter, so that netif_wake_queue() is called later from interupt code. General rules of thumb: 1. If the device sets XOFF (due to a call to netif_stop_queue()), it MUST call netif_wake_queue() at some point later. 2. Top level (core above the driver) WILL NOT call hard_start_xmit(), if XOFF is set. 3. Top level forgets about the device if XOFF is set and it is the device's responsibility to re-link itself to the runqueue with netif_wake_queue() so it can be serviced again. 4. The device MUST not set XOFF from a context not protected by a private xmit_lock, really. Such a lock could be stored in the device's private structure. If it does, it cannot guarantee that hard_start_xmit() is called with a clean XOFF. The interrupt handler could for example call netif_wake_queue() and a race condition could result. One way to reduce the probability of a race condition happening is not to stop the interface (calling to netif_stop_queue()) right at the entry to hard_start_xmit(), but rather stop it only when hard_start_xmit() finds that the device will not able to accept more packets (eg when its txmit ring is full). The general principles of this are as follows: -------OLD-------------------------------|--------NEW--------------- if (test_and_set_bit(0, &dev->tbusy)) { | netif_stop_queue(dev) return 1; | } else { | if (1) { real work | real work if (it_is_not_full()) | if (it_is_not_full()) dev->tbusy = 0; | netif_wake_queue() return 0; | return 0; } | } -----------------------------------------|-------------------------- Look at the softnetted-3c509.c for an example of such a trick. Such code should be used really as an intermidiate solution only because it does not guarantee you safety. The safe way to do it, without a doubt, is to add a spinlock in your device_private_structure and use it to lock critical sections such as those clearing XOFF. This is a trivial task for cards which already have this lock (eg eepro100, 8390 etc.), but is difficult for those that lack it (eg tulip). The softneted eepro100.c is a really good example of something that conforms well. [TODO: Update this with an example from the eepro] **************************************************** 3) The dev->interrupt DURING DEVICE OPERATIONS i.e when the device is sending and receiving packets ************************************************************ A) setting it: current: ======== dev->interrupt=1; new way ======= bit_set(LINK_STATE_RXSEM, &dev->state) This takes the device into the "input interrupt processing" state B) clearing it current: ======== dev->interrupt=0; clear_bit(LINK_STATE_RXSEM, &dev->state) This takes the device out of the "input interrupt processing" state Really, the dev->interrupt was obsoleted even in 2.2 but was still being used by some drivers. dev->interrupt is noop in 2.2 kernels and a SMP bug workaround in 2.0. ************************************************************ 4) mark_bh(NET_BH) ************************************************************ Gone. invoke instead: netif_wake_queue(dev); ************************************************************ 5) Miscellenia: ************************************************************ checking for set bits etc use the set_bit() calls eg: to check if we are in the LINK_STATE_START state do: if (test_bit(LINK_STATE_START, &dev->state)) ************************************************************ 6) TODO ************************************************************ - eepro example - skeleton.c example ************************************************************ 7) Credits ************************************************************ 19991227 - Jamal Hadi Salim - Alexey Kuznetsov