post 2.6.24-rc7 regression: unregister_netdevice: waiting for eth1 to become free

From: Mariusz Kozlowski
Date: Sun Jan 20 2008 - 19:17:23 EST


Hello,

I found that on current Linus tree unplugging my pcmcia wifi
card (Cabletron RoamAbout) causes part of the system to hang. I.e. mouse works
but keyboard is frozen until reboot. Actually only sysrq works but I can not
type in any terminal.

Syslog shows:

kernel: pccard: card ejected from slot 1
kernel: hermes @ 00010100: Card removed while issuing command 0x0002.
kernel: eth1: Error -19 disabling MAC port
kernel: unregister_netdevice: waiting for eth1 to become free. Usage count = 1
last message repeated 2 times

sysrq-d:

Showing all locks held in the system:
3 locks held by events/0/5:
#0: (events){--..}, at: [<c0126956>] run_workqueue+0xd2/0x1c9
#1: ((linkwatch_work).work){--..}, at: [<c012697c>] run_workqueue+0xf8/0x1c9
#2: (net_todo_run_mutex){--..}, at: [<c02d6c7a>] netdev_run_todo+0x15/0x23f
2 locks held by pccardd/1189:
#0: (&socket->skt_mutex){--..}, at: [<c0286918>] pccardd+0x129/0x227
#1: (net_todo_run_mutex){--..}, at: [<c02d6c7a>] netdev_run_todo+0x15/0x23f
1 lock held by agetty/5204:
#0: (&tty->atomic_read_lock){--..}, at: [<c022bb0f>] read_chan+0x3da/0x5bb
1 lock held by agetty/5206:
#0: (&tty->atomic_read_lock){--..}, at: [<c022bb0f>] read_chan+0x3da/0x5bb
1 lock held by agetty/5207:
#0: (&tty->atomic_read_lock){--..}, at: [<c022bb0f>] read_chan+0x3da/0x5bb
1 lock held by agetty/5208:
#0: (&tty->atomic_read_lock){--..}, at: [<c022bb0f>] read_chan+0x3da/0x5bb
1 lock held by agetty/5209:
#0: (&tty->atomic_read_lock){--..}, at: [<c022bb0f>] read_chan+0x3da/0x5bb
1 lock held by ifconfig/5581:
#0: (net_todo_run_mutex){--..}, at: [<c02d6c7a>] netdev_run_todo+0x15/0x23f

sysrq-w:

SysRq : Show Blocked State
task PC stack pid father
events/0 D dcd62780 0 5 2
dd834ee0 00000096 c033f2e0 dcd62780 00000000 c043f500 00000246 dd846000
dd834f28 c033547a 00000000 00000002 c02d6c7a dd834f0c c043f538 c043f524
c02d6c7a 00000000 dcfa5e54 c043f524 dd846000 c043f500 dd834f08 c043f744
Call Trace:
[<c033547a>] mutex_lock_nested+0xe6/0x306
[<c02d6c7a>] netdev_run_todo+0x15/0x23f
[<c02de559>] rtnl_unlock+0x12/0x14
[<c02df907>] linkwatch_event+0x25/0x27
[<c01269a4>] run_workqueue+0x120/0x1c9
[<c01272f6>] worker_thread+0x71/0xab
[<c0129dbd>] kthread+0x36/0x58
[<c0103507>] kernel_thread_helper+0x7/0x10
=======================
pccardd D dccaaa00 0 1189 2
dce33e00 00000096 c033f260 dccaaa00 00000282 dce33e10 ffff4f93 ffff4f53
dce33e30 c0334b38 c043f6c0 dce33e30 c04dc740 c04dc740 ffff4f93 c01202e9
dce16000 c04dc280 dcc09060 ffff4811 dce33e38 c0334bab dce33e40 c01207ae
Call Trace:
[<c0334b38>] schedule_timeout+0x43/0xa2
[<c0334bab>] schedule_timeout_uninterruptible+0x14/0x16
[<c01207ae>] msleep+0x12/0x18
[<c02d6d2e>] netdev_run_todo+0xc9/0x23f
[<c02de559>] rtnl_unlock+0x12/0x14
[<c02d54ad>] unregister_netdev+0x17/0x1b
[<ded1c111>] orinoco_cs_detach+0x1b/0x2d [orinoco_cs]
[<ded0873b>] pcmcia_device_remove+0x37/0xc6 [pcmcia]
[<c0258080>] __device_release_driver+0x6a/0x92
[<c0258493>] device_release_driver+0x2f/0x45
[<c025798f>] bus_remove_device+0x53/0x75
[<c0256194>] device_del+0x146/0x257
[<c02562b0>] device_unregister+0xb/0x15
[<ded08683>] pcmcia_card_remove+0x71/0x87 [pcmcia]
[<ded091ec>] ds_event+0x54/0x99 [pcmcia]
[<c0285f5a>] send_event+0x78/0xa4
[<c0286175>] socket_remove_drivers+0x12/0x14
[<c0286185>] socket_shutdown+0xe/0xc7
[<c0286264>] socket_remove+0x26/0x2c
[<c02869e7>] pccardd+0x1f8/0x227
[<c0129dbd>] kthread+0x36/0x58
[<c0103507>] kernel_thread_helper+0x7/0x10
=======================
ifconfig D ddb5ea00 0 5581 5580
dcfa5e2c 00000096 c033f2e0 ddb5ea00 00000000 c043f500 00000246 dce94000
dcfa5e74 c033547a 00000000 00000002 c02d6c7a dce9456c c043f538 c043f524
c02d6c7a 00000000 c043f524 dd834f08 dce94000 c043f500 dcfa5e54 00000000
Call Trace:
[<c033547a>] mutex_lock_nested+0xe6/0x306
[<c02d6c7a>] netdev_run_todo+0x15/0x23f
[<c02de559>] rtnl_unlock+0x12/0x14
[<c02d614c>] dev_ioctl+0x165/0x4f5
[<c02c9967>] sock_ioctl+0xe2/0x211
[<c016e3c2>] do_ioctl+0x22/0x71
[<c016e466>] vfs_ioctl+0x55/0x28a
[<c016e6ce>] sys_ioctl+0x33/0x51
[<c01028ca>] sysenter_past_esp+0x5f/0xa5
=======================

I run bisection three times but it doesn't give a clear answer which commit is the culprit.
First bisection: current Linus bad - v2.6.23 good. First bad commit:

ecd2ebdea350c40e73c00d400d74c8a09c072082 is first bad commit
commit ecd2ebdea350c40e73c00d400d74c8a09c072082
Author: Jarek Poplawski <jarkao2@xxxxxxxxx>
Date: Thu Jan 10 21:21:20 2008 -0800

[AX25] af_ax25: Possible circular locking.

Which is bogus because I don't even have ax25 compiled into the kernel.
Next two (to be sure) times I run bisection: current Linus bad - v2.6.24-rc7 good.
Two times I got the same result, first bad commit:

9cd40029423701c376391da59d2c6469672b4bed is first bad commit
commit 9cd40029423701c376391da59d2c6469672b4bed
Author: Pavel Emelyanov <xemul@xxxxxxxxxx>
Date: Thu Jan 10 03:48:38 2008 -0800

[NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms

But this also seems wrong. When I apply this patch the problem doesn't show.
Good thing is that both commits are very close to each other so I guess the culprit
is somewhere there near to them. All I can say for sure now is that v2.6.24-rc7 is good
and current Linus is bad.

Box is x86 32bit. Config is attached. If I can provide you with any more info just let me know.

My plan is to apply patches, one at a time, starting from 9cd4002942 and see what happens
unless someone knows already which one is bad.

Regards,

Mariusz

This is the last bisection log:

git-bisect start
# bad: [a7da60f41551abb3c520b03d42ec05dd7decfc7f] Remove bogus duplicate CONFIG_LGUEST_GUEST entry.
git-bisect bad a7da60f41551abb3c520b03d42ec05dd7decfc7f
# good: [3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235] Linux 2.6.24-rc7
git-bisect good 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235
# good: [d72ec9e20e4de995aa957f171cf84b136689e4c0] Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
git-bisect good d72ec9e20e4de995aa957f171cf84b136689e4c0
# bad: [0938e7586440ac97cedc0f5528a8684ebfa4ce43] Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
git-bisect bad 0938e7586440ac97cedc0f5528a8684ebfa4ce43
# good: [d8cf5389bd9d1f0ac9fea51796c274ba64b83d80] libata: relocate sdev->manage_start_stop configuration
git-bisect good d8cf5389bd9d1f0ac9fea51796c274ba64b83d80
# bad: [1b310fca30ac9851f79337ca72b1cf6a0f58064a] [TOKENRING]: rif_timer not initialized properly
git-bisect bad 1b310fca30ac9851f79337ca72b1cf6a0f58064a
# bad: [5c5482266537fdb24d6b8462540d8f65a6007a97] [BLUETOOTH]: Always send explicit hci_ll wake-up acks.
git-bisect bad 5c5482266537fdb24d6b8462540d8f65a6007a97
# bad: [0bcceadceb0907094ba4e40bf9a7cd9b080f13fb] [IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
git-bisect bad 0bcceadceb0907094ba4e40bf9a7cd9b080f13fb
# bad: [9cd40029423701c376391da59d2c6469672b4bed] [NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms
git-bisect bad 9cd40029423701c376391da59d2c6469672b4bed
# good: [b0de8e402dc5d3ee04f4d0f669ae492a3e569933] [NIU]: Support for Marvell PHY
git-bisect good b0de8e402dc5d3ee04f4d0f669ae492a3e569933

Attachment: config.gz
Description: GNU Zip compressed data