pointer to dev->hard_start_xmit() gets trashed in 2.4.0-test9

From: Hen, Shmulik (shmulik.hen@intel.com)
Date: Wed Oct 25 2000 - 02:59:05 EST


Hello,

We are developing an advanced networking services loadable module and are
having problems porting it to work on 2.4.x kernels. The driver is supposed
to provide services such as fault tolerance, load balancing and link
aggregation over a team of network adapters. It works OK on 2.2.x kernels
but hangs on 2.4.x kernels.

In order to debug it, we stripped it down to become a mere "intermediate" or
"filter" driver that binds to a base driver and passes everything through in
both directions (Rx, Tx, IOCTL, stats, etc.). After going through the basics
of modifying the driver to compile on 2.4.x kernels and fighting some nasty
dead locks due to the new nature of the networking layer, we managed to get
it to run. The driver will receive and transmit a few hundreds of thousands
of packets (while having a periodic timer expire 10 times a second and
running continuous IOCTLs), and then it causes an oops about not being able
to handle a page fault.

The function looks something like:

int iansHardStartXmit(struct sk_buff *skb, struct net_device *dev) {
        int res;
        struct net_device *base;

        spin_lock(&lock); //no interrupts involved, so spin_lock
should do
        base = get_base_driver_by_name(name);

        if(base != NULL) {
                BUG_TRAP(ptr_g == base->hard_start_xmit); //make sure it's
always the same addr
                res = base->hard_start_xmit(skb, base);
        }

        spin_unlock(&lock);
        return res;
}

We used kdb in order to track down the problem and found out the following
stack trace:

 EBP EIP function(args)
0xc4cd1c54 0xd081e3e7 [e100]__kallsyms+0xb (0xc4b595a0,
0xc840f200)
                                        e100 __kallsyms 0xd081e3dc
0xd081e3dc 0xd0820dsc
                0xd08244ba [ians]iansHardStartXmit+0xa6 (0xc4b595a0,
0xc4d9bc00)
                                        ians .text 0xd0824060 0xd0824414
0xd082452c
                0xc01f9d1f qdisc_restart+0xcf (0xc4d9bc00)
                                        kernel .text 0xc0100000 0xc01f9c50
0xc01f9f14
        *
        *
        *

This goes on and shows that this is an ICMP echo reply packet going down
through the IP stack to the filter driver (apparently 0xc4b595a0 is the skb,
0xc4d9bc00 is the *dev of the filter driver and 0xc840f200 is the *dev of
the base driver). The filter driver is supposed to call the
dev->hard_start_xmit of the base driver, but strangely it lands somewhere in
the data segment of the base driver (__kallsyms is a part of the symbol
table of the module according to insmod -m).
Figuring the dev->hard_start_xmit pointer got trashed somehow, we added a
check to make sure the same pointer is always called, and indeed this is the
case. Looking at the assembly code with kdb, we could see that the call to
the base driver is done by a 'call *%eax' instruction.

How is it possible that the pointer to the function keeps it's value, but
the jump to that function falls somewhere else ?

We are using:
RedHat 6.2
gcc v2.91.66
modutils v2.3.11-1 (was upgraded because of kdb)
kernel linux-2.4.0-test8 (SMP, +kdb, compiled with frame pointers,
SPINLOCK_DEBUG=2)
kdb v1.4-2.4.0-test9-pre9
Compaq ap500 dual p-III Xeon

Could this be a version mismatch between the components above ?

        Thanks,
        Shmulik Hen

        Software Engineer
        Linux Advanced Networking Services
        Network Communications Group, Israel (NCGj)
        Intel Corporation Ltd.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:15 EST