RE: Oops: 17 SMP ARM (v3.16-rc2)

From: Mattis Lorentzon
Date: Wed Aug 06 2014 - 07:10:16 EST


Russell,

> What is on the other end of the link?

16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).

There may be multiple problems. The backtrace has only been seen a few
times, on two different cards. Most of the time, the network for a random
card just stalls without any visible backtrace or error messages. The other
cards seem to be unaffected when this happens.

> What I would like to do is to stamp each packet in some way with an
> identifier marking its ring position, and then monitor the network to find out
> whether the packet at slot 85 was actually transmitted - that's made slightly
> harder because packets may be dropped at the receiver when operating in
> promisc mode. This would then allow us to work out some likely causes.

We would be glad to run this test on our setup, do you have more detailed
information on how to set it up?

> Note that after the transmit watchdog, the interface should recover and start
> operating normally again - and that should not take "several minutes."

After a network stall, we usually have to powercycle the ARM hardware to
get it back to a usable state. These stalls last at least several minutes,
perhaps indefinitely. It does not seem to recover properly, and is no longer
reachable via the network.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/