On 12/30/2009 2:58 AM, Stephen Hemminger wrote:On Wed, 30 Dec 2009 02:23:20 -0500Seems the problem is linked to auditd and X11 (but not nouveau).
Michael Breuer<mbreuer@xxxxxxxxxx> wrote:
Ok - I called dump_txring from sky2_net_intr:I notice that you have NOUVEAU Nvidia drivers loaded? The one difference in HW
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2725,8 +2791,10 @@ static void sky2_watchdog(unsigned long arg)
/* Hardware/software error handling */
static void sky2_err_intr(struct sky2_hw *hw, u32 status)
{
- if (net_ratelimit())
+ if (net_ratelimit()) {
dev_warn(&hw->pdev->dev, "error interrupt
status=%#x\n", status);
+ dump_txring(hw, 0);
+ }
if (status& Y2_IS_HW_ERR)
sky2_hw_intr(hw);
And got this:
Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x40000008
Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x40000008
Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29 done=29
Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29 done=29
Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt status=0x8
Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt status=0x8
Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30 done=31
Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30 done=31
between your board and mine is that I have ATI video card.
Today, I ran a bunch of scenarios. I first determined that the problem only manifest in runlevel 5. Next, this occurred with or without KMS and with or without nouveau. This happened whether or not I was logged in (local or remote), and regardless of window manager (xdm, gdm, kdm). I then checked to see what else was different between runlevel 3 and 5 - only thing was auditd. I disabled auditd and reran - no errors.
Now for the odd stuff:
The errors only manifest if the high throughput data transfer is initiated when the system is in runlevel 5 and auditd was started by init when transitioning from runlevel 3 to 5. For example, the following scenarios do not cause the errors to manifest:
runlevel3; start auditd runlevel 5; start transfer
runlevel3; chkconfig auditd off; runlevel5; start auditd; start transfer
runlevel3; start transfer (note: errors do not occur if I transition to runlevel 5 after the high bandwidth transfer has started)
runlevel3; startx; start transfer
The only way I get the problem to manifest is transition to runlevel 5 with chkconfig auditd on (level 5 only) and then initate the windows backup.
I'm guessing that there is some sort of race condition happening between X (xdm/gdm/kdm/greeter?) and auditd that is somehow corrupting something. I'd hazard a more or less obvious guess that whatever's being corrupted differs when there is already a high throughput transfer under way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/