Re: Oops: 17 SMP ARM (v3.16-rc2)

From: Russell King - ARM Linux
Date: Wed Aug 06 2014 - 08:56:10 EST


On Wed, Aug 06, 2014 at 11:10:06AM +0000, Mattis Lorentzon wrote:
> Russell,
>
> > What is on the other end of the link?
>
> 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
> machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).
>
> There may be multiple problems. The backtrace has only been seen a few
> times, on two different cards. Most of the time, the network for a random
> card just stalls without any visible backtrace or error messages. The other
> cards seem to be unaffected when this happens.

Can you ascertain whether these stalls are a result of some failure of the
receive side or the transmit side - you should be able to tell that if you
watch the packet counts via ifconfig on the stalled card. Also, it would
be useful to know whether the FEC interrupt was firing.

I hope you have some kind of serial console on these cards?

> > What I would like to do is to stamp each packet in some way with an
> > identifier marking its ring position, and then monitor the network to find out
> > whether the packet at slot 85 was actually transmitted - that's made slightly
> > harder because packets may be dropped at the receiver when operating in
> > promisc mode. This would then allow us to work out some likely causes.
>
> We would be glad to run this test on our setup, do you have more detailed
> information on how to set it up?

One of the problems is to find some way to stamp each packet with a 10-bit
number without having any side effects. I guess one possibility would be
to overwrite the source MAC address on transmit, which hopefully should not
cause any side effects.

> After a network stall, we usually have to powercycle the ARM hardware to
> get it back to a usable state. These stalls last at least several minutes,
> perhaps indefinitely. It does not seem to recover properly, and is no longer
> reachable via the network.

Hmm. Okay, I think the first thing we need to do is to work out why
the silent stalls are happening.

--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/