Re: ath9k crash 3.2-rc7

From: Mohammed Shafi
Date: Thu Jan 05 2012 - 10:30:18 EST


2012/1/5 MR <g7af0ec1e3ea1e7b1@xxxxxxxxxxx>:
>  > Hi John,
>
> I am the stupid original submitter who only sent this to linux-kernel
> initially.

:) no problem. i hope for you can recreate the issue consistently ,
can you please test with the attached patch and another debug patch.
please let me know if there is no panic but there are warnings (or) if
there are no warnings (or) the issue still appears(also the trace
thanks) , also if you need any help
let me also start the overnight wifi traffic in 3.2-rc7


>
>  > we will take a look at this.
>  >
>  > i can later come up with few debug patches to narrow down the panic.
>  > looks like a problem in ath_update_survey_stats(survey pointer). full
>  > stack trace will be helpful
>  > thanks.
>
> What I have posted is the full call trace. Right above this is the stack
> trace in hex:
>
> Process kworker/u:2 (pid: 6668, threadinfo ffff880027cd4000, task
> ffff880076a38000)
> Stack:
>  ffff880027cd5808 ffffffff81064830 ffff880027cd5808 ffff880147c51c80
>  ffff880027cd58b8 ffffffff8135a117 ffff880076a38620 0000000000011c80
>  0000000000011c80 ffff880076a38000 0000000000011c80 ffff880027cd5fd8
>
> Currently I have booted Linux 3.0 kernel to check whether the problem is
> already there. Unfortunately, with Linux 3.1 and 3.0 I often get the
> following in dmesg (this is at module load; sometimes the driver just stops
> working - then I get this on reloading the module):
>
> ath9k 0000:03:00.0: enabling device (0000 -> 0002)
> ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> ath9k 0000:03:00.0: setting latency timer to 64
> ath9k 0000:03:00.0: Failed to initialize device
> ath9k 0000:03:00.0: PCI INT A disabled
> ath9k: probe of 0000:03:00.0 failed with error -5
>
> As far as I understood, some similar problem was fixed after Linux 3.1.
>
>  > >> My card is (as lspci says):
>  > >>
>  > >> 03:00.0 Network controller: Atheros Communications Inc. AR9285
> Wireless
>  > >> Network Adapter (PCI-Express) (rev 01)
>  > >>         Subsystem: Device 1a3b:1089
>
>  > >> wq_worker_sleeping+0x10/0xa0
>  > >> __schedule+0x427/0x7b0
>  > >> ? call_rcu_sched+0x10/0x20
>  > >> schedule+0x3a/0x50
>  > >> do_exit+0x57c/0x840
>  > >> ? kmsg_dump+0x45/0xe0
>  > >> oops_end+0xa5/0xf0
>  > >> no_context+0xf2/0x270
>  > >> __bad_area_no_semaphore+0xe/0x10
>  > >> do_page_fault+0x2ba/0x450
>  > >> ? up+0x2d/0x50
>  > >> ? console_unlock+0x1df/0x250
>  > >> ? select_task_rq_fair+0x5be/0x970
>  > >> page_fault+0x25/0x30
>  > >> ? ath_update_survey_stats+0xb7/0x1c0 [ath9k]
>  > >> ath9k_config+0x115/0x780 [ath9k]
>  > >> ? queue_work+0x1a/0x20
>  > >> ? queue_delayed_work+0x25/0x30
>  > >> ? ieee80211_queue_delayed_work+0x46/0x60 [mac80211]
>  > >> ? ath9k_flush+0x155/0x1d0 [ath9k]
>  > >> ieee80211_hw_config+0xe2/0x160 [mac80211]
>  > >> ieee80211_scan_work+0x243/0x5c0 [mac80211]
>  > >> ? ieee80211_scan_rx+0x1c0/0x1c0 [mac80211]
>  > >> process_one_work+0x111/0x390
>  > >> worker_thread+0x162/0x340
>  > >> manage_workers.clone.26+0x240/0x240
>  > >> kthread+0x96/0xa0
>  > >> kernel_thread_helper+0x4/0x10
>  > >> ? kthread_worker_fn+0x190/0x190
>  > >> ? gs_change+0x13/0x13
>
>



--
shafi
From 509a141e14f794de8fbfe847a3de548ace21d7ec Mon Sep 17 00:00:00 2001
From: Mohammed Shafi Shajakhan <mohammed@xxxxxxxxxxxxxxxx>
Date: Wed, 21 Dec 2011 20:11:35 +0530
Subject: [PATCH v2] mac80211: fix scan state machine

when we run high bandwidth UDP traffic and we trigger a scan, the scan
state machine seems to be looping in SUSPEND->RESUME->DECISION->SUSPEND
and SET_CHANNEL seems to be never called as 'tx_empty' is never true
while running UDP traffic. fix this by settting SET_CHANNEL state when
we get into RESUME state.

Cc: Leela Kella <leela@xxxxxxxxxxxxxxxx>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@xxxxxxxxxxxxxxxx>
---
net/mac80211/scan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 2c5041c..2908e56 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -625,7 +625,7 @@ static void ieee80211_scan_state_resume(struct ieee80211_local *local,
local->leave_oper_channel_time = jiffies;

/* advance to the next channel to be scanned */
- local->next_scan_state = SCAN_DECISION;
+ local->next_scan_state = SCAN_SET_CHANNEL;
}

void ieee80211_scan_work(struct work_struct *work)
--
1.7.0.4

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6e3d838..4f8c905 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -190,6 +190,19 @@ static int ath_update_survey_stats(struct ath_softc *sc)
if (!ah->curchan)
return -1;

+ /* just to make pos does not exceeds ATH9K_NUM_CHANNELS - 1 */
+ if (pos > 37) {
+ WARN_ON(1);
+ printk("\npos is %d index out of bounds!!! in %s", pos, __func__);
+ return -1;
+ }
+
+ if (!survey) {
+ WARN_ON(1);
+ printk("\nNULL pointer for survey !!! in %s", __func__);
+ return -1;
+ }
+
if (ah->power_mode == ATH9K_PM_AWAKE)
ath_hw_cycle_counters_update(common);