Re: Oops after register_netdev() failure in 2.6.3-bk5

From: Pavel Roskin
Date: Mon Feb 23 2004 - 22:50:14 EST


On Tue, 24 Feb 2004 viro@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:

> On Mon, Feb 23, 2004 at 09:43:22PM -0500, Pavel Roskin wrote:
> > Hello!
> >
> > Linux 2.6.3-bk5 (and perhaps older versions too) accesses uninitialized
> > memory if register_netdev() fails in the dev->init call. I could
> > reproduce the problem in the dummy driver.
>
> It's not register_netdev(); it's broken cleanup code in the driver.

Agreed.

> Fix in case of dummy.c is trivial -
> diff -urN RC3-bk1/drivers/net/dummy.c RC3-bk1-current/drivers/net/dummy.c
> --- RC3-bk1/drivers/net/dummy.c Wed Feb 18 13:40:43 2004
> +++ RC3-bk1-current/drivers/net/dummy.c Mon Feb 23 21:56:46 2004
> @@ -124,7 +124,7 @@
> dummies = kmalloc(numdummies * sizeof(void *), GFP_KERNEL);
> if (!dummies)
> return -ENOMEM;
> - for (i = 0; i < numdummies && !err; i++)
> + for (i = 0; !err && i < numdummies && !err; i++)
> err = dummy_init_one(i);
> if (err) {

Add "i--;" here. The device that failed doesn't need another
free_netdev().

> while (--i >= 0)
>
> Now, which driver have you actually seen it in?

orinoco_plx. I've put the current snapshot here:
http://www.red-bean.com/~proski/tmp/orinoco-oops.tar.gz

orinoco_init() in orinoco.c has been modified to fail always.

I could try to reduce the driver to another dummy that can be run on any
system, but it will take time.

# modprobe orinoco_plx
orinoco.c 0.14alpha2HEAD (David Gibson <hermes@xxxxxxxxxxxxxxxxxxxxx>,
Pavel Roskin <proski@xxxxxxx>, et al)
orinoco_plx.c 0.14alpha2HEAD (Daniel Barlow <dan@xxxxxxxxxx>, David Gibson
<hermes@xxxxxxxxxxxxxxxxxxxxx>)
orinoco_plx: CIS: 01:03:00:00:FF:17:04:67:5A:08:FF:1D:05:01:67:5A:
orinoco_plx: Local Interrupt already enabled
Detected Orinoco/Prism2 PLX device at 0000:01:00.0 irq:12, io addr:0xc400
orinoco_plx: init_one(), FAIL!
orinoco_plx: probe of 0000:01:00.0 failed with error -16
Unable to handle kernel paging request at virtual address 6b6b6b77
printing eip:
c02d0d88
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c02d0d88>] Not tainted
EFLAGS: 00010202
EIP is at rtnetlink_fill_ifinfo+0x278/0x430
eax: 6b6b6b6b ebx: cf03e0c0 ecx: 00000640 edx: cd4e3920
esi: 00000000 edi: cd4e38a8 ebp: c12b9ed8 esp: c12b9eb4
ds: 007b es: 007b ss: 0068
Process events/0 (pid: 3, threadinfo=c12b8000 task=c12bcc80)
Stack: c12b9ec8 00000640 00000f40 cd4e3000 cf05fde4 6b6b6b6b cf05fde4
00000000
00000010 c12b9efc c02d11b6 00000000 00000000 00000000 cf03e0c0
cf03e0c0
c12b9f20 c12b9f20 c12b9f34 c02d1a67 00000001 00000003 cf121340
5de4c35d
Call Trace:
[<c02d11b6>] rtmsg_ifinfo+0x46/0xb0
[<c02d1a67>] linkwatch_run_queue+0x147/0x1f0
[<c02d1b52>] linkwatch_event+0x42/0x70
[<c01363b4>] worker_thread+0x1f4/0x3e0
[<c0119f67>] recalc_task_prio+0x97/0x1c0
[<c02d1b10>] linkwatch_event+0x0/0x70
[<c011b6e0>] default_wake_function+0x0/0x10
[<c011b6e0>] default_wake_function+0x0/0x10
[<c013b105>] kthread+0x95/0xa0
[<c01361c0>] worker_thread+0x0/0x3e0
[<c013b070>] kthread+0x0/0xa0
[<c0107019>] kernel_thread_helper+0x5/0xc

Code: 8b 50 0c b9 ff ff ff ff 31 c0 83 c2 08 89 d7 f2 ae f7 d1 49



Known facts:

orinoco_plx_init_one() exists before the oops.

rtnetlink_fill_ifinfo() crashes here:

if (dev->qdisc_sleeping)
RTA_PUT(skb, IFLA_QDISC,
strlen(dev->qdisc_sleeping->ops->id) + 1,
dev->qdisc_sleeping->ops->id);


dev->qdisc_sleeping is 0x6b6b6b6b, which indicates freed memory (I have
slab debugging enabled).

Reordering statements in orinoco_plx_init_one() after "fail:" may prevent
the oops, but only the first time. If the module is unloaded and loaded
again, it crashes.

Commenting out free_orinocodev() (wrapper around free_netdev()) fixes the
oops, but I think it could leave some allocated memory. That's the likely
workaround if we fail to fix the problem.

--
Regards,
Pavel Roskin
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html