Re: Oops in mac80211 with 2.6.26-rc3 triggered playing a video

From: Vegard Nossum
Date: Mon May 26 2008 - 03:49:53 EST


Hi,

On Mon, May 26, 2008 at 6:41 AM, Justin Madru <jdm64@xxxxxxxxx> wrote:
> Hi,
>
> I've been getting kernel crashes at random when a video file just starts to
> play (using VLC).
> As soon as the first frame shows, the system locks up hard (sometimes not
> even alt+sysrq+b works).
>
> Just recently, when it crashed it was able to print an oops to the syslog.
> The weird thing is that it says that it's a bug in mac80211? But I only have
> the crash the instant a video file starts to play. (I have an Intel 3945
> wireles, and Intel i945 graphic card)
>
> BUG: unable to handle kernel NULL pointer dereference at 00000090
> IP: [<f89e721f>] :mac80211:ieee80211_associate+0x24f/0x610
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT SMP
> Modules linked in: i915 acpi_cpufreq cpufreq_powersave cpufreq_stats
> cpufreq_userspace cpufreq_conservative container sbs sbshc ext3 jbd mbcache
> arc4 ecb crypto_blkcipher rtc dcdbas cryptomgr crypto_algapi psmouse evdev
> snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm iwl3945 mac80211 snd_timer
> crc32 snd_page_alloc video backlight output ac button battery intel_agp
> reiserfs sr_mod cdrom sg ata_piix ehci_hcd uhci_hcd usbcore thermal
> processor fan
>
> Pid: 1899, comm: iwl3945 Not tainted (2.6.26-rc3-git #1)
> EIP: 0060:[<f89e721f>] EFLAGS: 00010246 CPU: 1
> EIP is at ieee80211_associate+0x24f/0x610 [mac80211]
> EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: f7b85e38
> ESI: f7b85e84 EDI: ecc7122e EBP: f7bbdd34 ESP: f7bbdcc0
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process iwl3945 (pid: 1899, ti=f7bbd000 task=f718d390 task.ti=f7bbd000)
> Stack: f7b85e84 00000000 f7bbdd14 00000202 f7b85e38 f7b85800 f7f65f00
> 00000018
> f7bbdcfa 00000000 00000421 00000003 00000006 00000052 f7bbdd0c ecc7122c
> f71593a4 00000000 f7bbde15 f7bbdd3c c0295679 303a3030 33623a66 3a31613a
> Call Trace:
> [_format_mac_addr+0x79/0x90] ? _format_mac_addr+0x79/0x90
> [sched_debug_show+0x9c6/0xcb0] ? sched_debug_show+0x9c6/0xcb0
> [<f89e7610>] ? ieee80211_auth_completed+0x30/0x40 [mac80211]
> [<f89e7a73>] ? ieee80211_rx_mgmt_auth+0x303/0x4b0 [mac80211]
> [hrtimer_start+0xc2/0x150] ? hrtimer_start+0xc2/0x150
> [hrtick_set+0x85/0x100] ? hrtick_set+0x85/0x100
> [jbd:schedule+0x364/0x8c0] ? schedule+0x364/0x870
> [<f89e7da7>] ? ieee80211_sta_rx_queued_mgmt+0x187/0xcb0 [mac80211]
> [ext3:preempt_schedule+0x33/0x100] ? preempt_schedule+0x33/0x50
> [mac80211:dev_queue_xmit+0xa6/0x1f20] ? dev_queue_xmit+0xa6/0x330
> [mac80211:_spin_unlock_bh+0x18/0xb0] ? _spin_unlock_bh+0x18/0x20
> [<f89e33b7>] ? ieee80211_rx_bss_get+0xa7/0xc0 [mac80211]
> [mac80211:skb_dequeue+0x4d/0x360] ? skb_dequeue+0x4d/0x70
> [<f89e960f>] ? ieee80211_sta_work+0x8f/0x760 [mac80211]
> [hrtick_set+0xa7/0x100] ? hrtick_set+0xa7/0x100
> [jbd:schedule+0x364/0x8c0] ? schedule+0x364/0x870
> [run_workqueue+0x80/0x120] ? run_workqueue+0x80/0x120
> [<f89e9580>] ? ieee80211_sta_work+0x0/0x760 [mac80211]
> [worker_thread+0x88/0xe0] ? worker_thread+0x88/0xe0
> [<c013ba80>] ? autoremove_wake_function+0x0/0x40
> [worker_thread+0x0/0xe0] ? worker_thread+0x0/0xe0
> [kthread+0x42/0x70] ? kthread+0x42/0x70
> [kthread+0x0/0x70] ? kthread+0x0/0x70
> [kernel_thread_helper+0x7/0x18] ? kernel_thread_helper+0x7/0x18
> =======================
> Code: c6 00 00 8b 55 9c 8b 4d c8 8b 42 70 88 41 01 8b 42 70 8b 7d c8 89 c1
> c1 e9 02 83 c7 02 f3 a5 89 c1 83 e1 03 74 02 f3 a4 8b 5d d0 <8b> 9b 90 00 00
> 00 85 db 89 5d d8 0f 84 6d 03 00 00 8b 7d cc 8b
> EIP: [<f89e721f>] ieee80211_associate+0x24f/0x610 [mac80211] SS:ESP
> 0068:f7bbdcc0
> ---[ end trace 7afccad6600bfa21 ]---

The code decodes to:

1d: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
1f: 89 c1 mov %eax,%ecx
21: 83 e1 03 and $0x3,%ecx
24: 74 02 je 0x28
26: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
28: 8b 5d d0 mov -0x30(%ebp),%ebx
0: 8b 9b 90 00 00 00 mov 0x90(%ebx),%ebx <---- BAM!
6: 85 db test %ebx,%ebx
8: 89 5d d8 mov %ebx,-0x28(%ebp)
b: 0f 84 6d 03 00 00 je 0x37e
11: 8b 7d cc mov -0x34(%ebp),%edi
14: 8b .byte 0x8b

Recompiling net/mac80211/mlme.c gives me that this happens on line 675.

ieee80211_compatible_rates net/mac80211/mlme.c:675
ieee80211_send_assoc net/mac80211/mlme.c:767
ieee80211_associate net/mac80211/mlme.c:955

So it is in fact compatible_rates() that crashes (but hidden in your
Oops because of heavy inlining).

So looking at the latest changelog in linus/master, we have this change:

commit 0d580a774b3682b8b2b5c89ab9b813d149ef28e7
Author: Helmut Schaa <hschaa@xxxxxxx>
Date: Tue May 20 09:56:37 2008 +0200

mac80211: fix NULL pointer dereference in ieee80211_compatible_rates

Fix a possible NULL pointer dereference in ieee80211_compatible_rates
introduced in the patch "mac80211: fix association with some APs". If no bss
is available just use all supported rates in the association request.

Signed-off-by: Helmut Schaa <hschaa@xxxxxxx>
Signed-off-by: John W. Linville <linville@xxxxxxxxxxxxx>

So does applying/cherry-picking that fix your problem? (Patch
attached, but not inlined.)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
commit 0d580a774b3682b8b2b5c89ab9b813d149ef28e7
Author: Helmut Schaa <hschaa@xxxxxxx>
Date: Tue May 20 09:56:37 2008 +0200

mac80211: fix NULL pointer dereference in ieee80211_compatible_rates

Fix a possible NULL pointer dereference in ieee80211_compatible_rates
introduced in the patch "mac80211: fix association with some APs". If no bss
is available just use all supported rates in the association request.

Signed-off-by: Helmut Schaa <hschaa@xxxxxxx>
Signed-off-by: John W. Linville <linville@xxxxxxxxxxxxx>

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index e470bf1..7cfd12e 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -730,7 +730,17 @@ static void ieee80211_send_assoc(struct net_device *dev,
if (bss->wmm_ie) {
wmm = 1;
}
+
+ /* get all rates supported by the device and the AP as
+ * some APs don't like getting a superset of their rates
+ * in the association request (e.g. D-Link DAP 1353 in
+ * b-only mode) */
+ rates_len = ieee80211_compatible_rates(bss, sband, &rates);
+
ieee80211_rx_bss_put(dev, bss);
+ } else {
+ rates = ~0;
+ rates_len = sband->n_bitrates;
}

mgmt = (struct ieee80211_mgmt *) skb_put(skb, 24);
@@ -761,10 +771,7 @@ static void ieee80211_send_assoc(struct net_device *dev,
*pos++ = ifsta->ssid_len;
memcpy(pos, ifsta->ssid, ifsta->ssid_len);

- /* all supported rates should be added here but some APs
- * (e.g. D-Link DAP 1353 in b-only mode) don't like that
- * Therefore only add rates the AP supports */
- rates_len = ieee80211_compatible_rates(bss, sband, &rates);
+ /* add all rates which were marked to be used above */
supp_rates_len = rates_len;
if (supp_rates_len > 8)
supp_rates_len = 8;