[PATCH] e1000e: prevent concurrent access to NVRAM

From: Thomas Gleixner
Date: Thu Oct 02 2008 - 19:43:42 EST

Next message: Roland McGrath: "Re: [PATCH] printk.caller"
Previous message: Dave Chinner: "Re: [PATCH] Improve buffered streaming write ordering"
In reply to: Thomas Gleixner: "RE: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG"
Next in thread: Jesse Brandeburg: "Re: [PATCH] e1000e: prevent concurrent access to NVRAM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2 Oct 2008, Olaf Kirch wrote:

> On Thursday 02 October 2008 16:28:42 Jiri Kosina wrote:
> >
> > 15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
> > 15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
> > 15:50:52 linux-pr0e kernel: Call Trace:
> > 15:50:52 linux-pr0e kernel: [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
> > 15:50:52 linux-pr0e kernel: [<ffffffff80493716>] dump_stack+0x69/0x6f
> > 15:50:52 linux-pr0e kernel: [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
> > 15:50:52 linux-pr0e kernel: [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
> > 15:50:52 linux-pr0e kernel: [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
> > 15:50:52 linux-pr0e kernel: [<ffffffff80251fe5>] kthread+0x47/0x73
> > 15:50:52 linux-pr0e kernel: [<ffffffff8020d7a9>] child_rip+0xa/0x11
>
> Looks like the e1000 watchdog racing with some dhclient activity (upping the interface).
>
> I just noticed that the driver actually uses register pages. So it looks like it's
> possible to have something like this without the mutex:
>
> process A selects page A
> process B selects page B
> process A writes to register at offset A'
>
> So we may end up writing to the wrong register. I think I heard Vojtech mention
> that the e1000e also has a register based interface to erase/rewrite the NVM
> programmatically. Do we know at which offsets these registers live?

Linus,

can you please apply the patch below which prevents the concurrent
access to the NVRAM. It triggered the trace which Olaf reported above.

I worked out that patch on Sept 24th and it triggered a couple of
problems in the e1000e code right away. These have been addressed by
Jesse and are part of the patch series he posted in the last days.

Still, all we have in mainline is some "hopefully bug preventing"
patch which does not address the root cause at all.

The confirmed bugs where the nvram acquire code was called
concurrently are still in your tree and the prevention patch along
with the resulting bugfixes are stuck in some obscure intel QA
process.

Please apply at least the bug prevention patch below.

Thanks,

tglx
---
Subject: e1000e prevent concurrent access and debug contention on NVM SWFALG
Date: Wed, 24 Sep 2008 00:45:08 -0700
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: jesse.brandeburg@xxxxxxxxx
---

drivers/net/e1000e/ich8lan.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

Index: linux-2.6/drivers/net/e1000e/ich8lan.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000e/ich8lan.c
+++ linux-2.6/drivers/net/e1000e/ich8lan.c
@@ -382,6 +382,9 @@ static s32 e1000_get_variants_ich8lan(st
return 0;
}

+static DEFINE_MUTEX(nvm_mutex);
+static pid_t nvm_owner = -1;
+
/**
* e1000_acquire_swflag_ich8lan - Acquire software control flag
* @hw: pointer to the HW structure
@@ -395,6 +398,15 @@ static s32 e1000_acquire_swflag_ich8lan(
u32 extcnf_ctrl;
u32 timeout = PHY_CFG_TIMEOUT;

+ WARN_ON(preempt_count());
+
+ if (!mutex_trylock(&nvm_mutex)) {
+ WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
+ nvm_owner);
+ mutex_lock(&nvm_mutex);
+ }
+ nvm_owner = current->pid;
+
while (timeout) {
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
@@ -409,6 +421,8 @@ static s32 e1000_acquire_swflag_ich8lan(

if (!timeout) {
hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
return -E1000_ERR_CONFIG;
}

@@ -430,6 +444,9 @@ static void e1000_release_swflag_ich8lan
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
ew32(EXTCNF_CTRL, extcnf_ctrl);
+
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
}

/**

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Roland McGrath: "Re: [PATCH] printk.caller"
Previous message: Dave Chinner: "Re: [PATCH] Improve buffered streaming write ordering"
In reply to: Thomas Gleixner: "RE: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG"
Next in thread: Jesse Brandeburg: "Re: [PATCH] e1000e: prevent concurrent access to NVRAM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]