Diald dies on closing idle link (details)

Ralph Clark (rclark@virgosolutions.demon.co.uk)
Mon, 24 May 1999 13:31:27 +0100


More information on my SuSE 6.1/kernel 2.2.9/diald/pppd problem. To recap:

Ralph Clark wrote (on linux-diald@vger.rutgers.edu):

> The following errors appear in my /var/log/messages whenever diald tries to
> close an idle link.
>
> Nonzero exit status (7) on command '/sbin/route add 158.152.1.222 metric 1 dev
> sl0'
>
> Nonzero exit status (7) on command '/sbin/route add default metric 1 dev sl0'
>
> (158.152.1.222 is the static IP address I have entered for the remote end of the
> ppp link.)
>
> Each time this happens diald immediately dies. Sometimes it succeeds in stopping
> pppd and sometimes it doesn't. If it doesn't, the link stays up indefinitely
> (i.e. all night) and my phone bill swells painfully.
>
> I'm running the 2.2.9 kernel with SuSE 6.1 which includes diald-0.16 and
> ppp-2.3.5.
>
> Is there something wrong with my routing set up? Unfortunately the SuSE ppp
> support has all changed and I'm not too sure if their YaST setup tool is
> interpreting the routing info I put in the same way it used to.

Well it also seems that diald can die immediately after bringing the link up:

Setting pointopoint route for ppp0
running '/sbin/route add 158.152.1.222 metric 0 dev ppp0'
SIGCHLD[5]: pid 2646 system, status 0
Establishing routes for ppp0
running '/sbin/route add default metric 0 dev ppp0'
SIGCHLD[6]: pid 2647 system, status 0

Then diald disappears.

OK. I captured the routing table under various circumstances. Here goes:

Before diald has started up (note pristine routing table):

jove:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 dummy
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo

While diald is running but link has not been brought up yet (note THREE sl0
routes - there should only be two created by diald: one default and one
point-to-point):

jove:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
192.168.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 dummy
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 0.0.0.0 0.0.0.0 U 0 0 0 sl0

After diald has brought up the link (note THREE ppp0 and THREE sl0 devices).
NB. ppp is configured to set itself up as a default route because if you don't do
this then diald doesn't bring up the link.

jove:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 ppp0
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 ppp0
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
192.168.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 dummy
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 0.0.0.0 0.0.0.0 U 0 0 0 ppp0
0.0.0.0 0.0.0.0 0.0.0.0 U 0 0 0 sl0

After diald has been manually shut down again (note the one of the point-to-point
sl0 routes doesn't go away when it should).

jove:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
192.168.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 dummy
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo

When diald dies unexpectedly, even if pppd is brought down, it fails to clear any
of the sl0 routes.

jove:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
158.152.1.222 0.0.0.0 255.255.255.255 UH 0 0 0 sl0
192.168.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 dummy
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 0.0.0.0 0.0.0.0 U 0 0 0 sl0

I found the following posting on the linux-kernel list. It clearly points the
finger at some reckless changes in the 2.2 kernel:

> Routing Table (Feature)
> Author:
>
> Richard B. Johnson
> Date:
>
> 1999/02/16
> Forum:
>
> fa.linux.kernel
> more headers
> author posting history
>
>
>
> When the startup scripts initialize the network, they generally
> do `route add -net NNN.NNN.NNN.0 netmask MMM.MMM.MMM.MMM dev DEV`
> This is for compatibility. Alexy's latest code automatically adds
> a route when `ifconfig` is executed. This has not caused any problems
> unless, when a ppp connection is established, you modify the routing
> table to correspond to the new configuration.
>
> Please don't tell me you don't have to. It depends upon the configuration.
> My configuration changes the actual network and routes between two
> networks so I have connectivity between all my home machines and the
> internet via my ppp link.
>
> In changing the network, from 204.178.40.0, netmask 255.255.248.0 to
> 204.178.47.0, netmask 255.255.255.0, it is necessary to delete the
> existing network, i.e., `route del -net 204.178.40.0 netmask 255.255.248.0`
> and add the new one. Unfortunately, the routing table contains two
> entries for the same network:
>
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 204.178.40.0 * 255.255.248.0 U 0 0 0 eth0
> 204.178.40.0 * 255.255.248.0 U 0 0 0 eth0
> 127.0.0.0 * 255.0.0.0 U 0 0 0 lo
> default cisco.analogic. 0.0.0.0 UG 1 0 0 eth0a
>
> The `ifconfig` command only deletes one. The command has to be executed
> twice to delete the second one. The problem is that when my ppp link
> is shut down, I do not add two identical routes back (nor should I).
> The result is that the second time the ppp link is established, I will
> get an error if I try to delete two routes. So, a work-around of
> always deleting two identical routes doesn't work very well.
>
> Therefore, since it has now become the De-facto standard to automatically
> add routes when the interface is configured, I respectfully ask that
> when a route is deleted, the kernel software delete all entries of that
> same route. Since time is not critical, looking through the whole routing
> table, rather than just exiting when the first equivalent entry is
> found, will go a long way towards helping to maintain compatibility
> with existing SYSV systems.
>
>
> Cheers,
> Dick Johnson
> ***** FILE SYSTEM WAS MODIFIED *****
> Penguin : Linux version 2.2.1 on an i686 machine (400.59 BogoMips).
> Warning : It's hard to remain at the trailing edge of technology.
> Wisdom : It's not a Y2K problem. It's a Y2Day problem.
>

However this doesn't explain why I get THREE sl0 interfaces. And I don't know how
to use this information to fix the problem anyway. I don't know what I have to
change to get the routing to behave. My own /etc/route.conf is innocuous. It
contains only this line:

192.168.0.0 0.0.0.0 255.255.255.0 eth0

Can somebody figure this out for me? Per Yoda's dictum, I promise to pass on
whatever I learn from this.

--
rclark@virgosolutions.demon.co.uk        Ralph Clark, Virgo Solutions Ltd (UK)
   __   _
  / /  (_)__  __ ____  __    * Powerful * Flexible * Compatible * Reliable *
 / /__/ / _ \/ // /\ \/ /  *Well Supported * Thousands of New Users Every Day*
/____/_/_//_/\_,_/ /_/\_\    The Cost Effective Choice - Linux Means Business!

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/