RE: Oops: 17 SMP ARM (v3.16-rc2)
From: Mattis Lorentzon
Date: Thu Aug 07 2014 - 07:11:21 EST
Russell,
> Can you ascertain whether these stalls are a result of some failure of the
> receive side or the transmit side - you should be able to tell that if you watch
> the packet counts via ifconfig on the stalled card. Also, it would be useful to
> know whether the FEC interrupt was firing.
grep eth /proc/interrupts
151: 0 0 0 0 GIC 151 2188000.ethernet
166: 1205661 0 0 0 gpio-mxc 6 2188000.ethernet
The interrupt counter 166 increases regularly during the stalls.
Ifconfig indicates that the RX and TX counters do not increase.
> I hope you have some kind of serial console on these cards?
Yes, indeed. Local stimuli seems to be able to unstall the network in a
somewhat random fashion. Running e.g. ifconfig or ping locally may
immediately or after up to about half a minute make the network responsive.
However, it usually degenerates again to a complete stall within seconds.
Without local stimuli the network does not appear to recover at all. The card
does not even respond to pings (again, most often without any apparent
error messages).
Running both of the following commands in parallel from the FC server seems
to trigger the problem within minutes (please note that the arm card stops
responding to both ping and ssh):
# while :; do ssh arm-card echo Ok; done
# ping arm-card
We have noticed the same problem on both the i.MX6 and the Zynq cards
(using KSZ9021 and Cadence GEM drivers). However, the number of
iterations required to trigger the problem vary. Sometimes it might stall after
less than 100, but in other cases the stalls begin after nearly 10000 iterations.
Once stalled (and unstalled after stimuli), the network on that particular card
degenerates a lot more often. Apart from the kernel, IP numbers and MAC
addresses, the software configurations are identical between the Zynq and
the i.MX6. Perhaps the fault is unrelated to the Freescale driver?
> Hmm. Okay, I think the first thing we need to do is to work out why the
> silent stalls are happening.
Would you have any ideas on what to check next?
Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.
To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/