RE: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + someaccesspoint == hardlock

From: NilsRadtkelkml
Date: Mon May 03 2010 - 15:18:20 EST



Hello,

Some of the previous messages about this BUG seem lost, so here again in a single message:

When connecting to the local university's wireless APs, rather regularly the kernel crashes.

So, connecting to one of those the system hardlocks. No panic LED blinking, no mouse cursor moving,
no magic sysreq key working. Poweroff is the only way to revive the machine. Until the upcoming
lock-up.

What might serve as a hint: using knemo and kwifimanager (with wpa_supplicant in the bg) it feels like it is
freezing more frequently. Not running kwifimanager seems to reduce the frequency w/o however eliminating it
completely. Sometimes iceweasel has been touched and then the crash occurred.
Maybe some TX (or subsequent RX) provokes the crash? Some idea that also came up was that accessing iwlagn
drivers isn't clean when it's happening from more that one place at the same time
(->wpa_supplicant + kwifimanager)? [edit: this isn't probably it, or not solely]

This behaviour is currently only known to happen in this exact place/geolocation. So one assumption is that it
is somehow correlated to a particular type of access point, firmware respectively. Nonetheless, even
if there's some fw sending kill packets, the kernel should not crash, obviously.

The access points in range are of type: (obviously, fw versions are unknown..)

- one 0:1d:8b PIRELLI BROADBAND SOLUTIONS
- couple of those 0:40:96 Cisco Systems, Inc.
- one 0:1a:70 Cisco-Linksys

This notebook has no serial port, so we haven't been able to get a glimpse on what is happening at that
precise moment. xconsole or dmesg haven't been helpful as the system just freezes and that's it.

Other circumstances leading to a crash (reproducibility):

- RFKILL switch, switching radio on, (obviously) is a precondition. But then, it mainly
happens shortly after switching radio on,
- it happens more frequently and faster after switching radio on when kde + firefox + xy is running

Please find attached an image of the bug. Finally I could catch a glimpse of what happens, I longued
for this quite some time.

Strangely, as stated in the previous message, this bug does only happen in conjunction at a specific
geographical position, so far it only happened there that is. How does this correlate with the designated
code at line 2076 in iwl-agn-rs.c?:

/* Sanity-check TPT calculations */
BUG_ON(window->average_tpt != ((window->success_ratio *
tbl->expected_tpt[index] + 64) / 128));


Furthermore, it seems like there was already an encounter with this bug before. Previously I had no means
of grabbing more information off it, though, so I wrote down some notes:

2010-03-20: iwl-agn-rs.c line 2076: bug
The file where I kept my note for this bug contains:

- crashed during wireless AP scan using wpa_supplicant (after RFKILL switching to on..)
- this happend right after a fresh reboot (after crashing for the same bug)

- afterwards, auth.log contained this (bad, bad where's my data gone?):
Mar 20 18:24:01 localhost CRON[16684]: pam_unix(cron:session): session closed for user root
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@i686; it; rv:1.9.2.1) Gecko/20091211 Firefox/3.5.6 (like Firefox/3.5.11; Debian-3.5.11-1)"

Noticed the line in the bug from 2010-03-20? Yep, the same as in the one of today.

2010-04-30: One crash, two oopses: Msg restored mainly from remainders left in some brain's corner, w/
heavy massaging applied, mainly Code has been written down properly due to outside constrains..
Unfortunately, the proper ordering of lines got lost, so, yes, this is a poor report..
Order is as it's believed it was shown on-screen.

Number 1: nearly lost as subsequent oops made it scroll off-screen, only bottom half visible:

in CPU idle +x3f/x90

EIP rs_tx_status +x8f/x2030 SS: ESP 0668:f749fc4c

Code: 7f 22 69 c7 80 01 00 00 8b 8c 24 fc 01 00 00 8b 84 01 a8 00 00 00 6b 04 b0 64 39 44 24 1c 0f 8e
d2 fe ff ff 31 db e9 cb fe ff ff <0f> 0b fe 8b 54 24 4c 8b 8c 24 fc 01 00 00 8b 82 04 00 00 00

in iwl_rx_queue restock xbd/x130 iwlcore

panic -> oops begin

kernel panic not syncing fatal exec in int

pid 0 comm: swapper tainted GD .33.2


Comments, ideas, fixes welcome... this BUG is highly annoying..

Happy to serve with more info if required (and available..).

Regards,

Nils

Attachment: 2010-04-29_kernel-oops_iwlagn_2076_cut4www.png
Description: PNG image