1 second periodic incoming packet loss on alpha smp DP264/3C905B - 2.2.14

From: Theo E. Schlossnagle (theos@cnds.jhu.edu)
Date: Sat Jul 15 2000 - 16:43:08 EST


I am running 2.2.14 on alphaev6 SMP (DP264). I have a 3C905B Cyclone
card.

I have two small testing programs that come with the Spread source code
(www.spread.org) that simply send/receive UDP packets as fast as
possible and analyze losses.

On a typical network you will see a few losses here and there, but I am
seeing extremely strange behaviour on my (two) smp alphas. I brought an
smp x86 machine running 2.2.14 with a 3C905 Boomerang (I didn't have
another cyclone, to try to get more experimental data.

Here is the situation: (Alpha 1 = A1, Alpha 2 = A2 and Intel 1 = B)

If I flood from A1 -> B or from A2 -> B (when Alphas are sending), I
reliable get ~0% packet loss and when loss occurs it is usually a single
packet at a time. GREAT!

If I flood from A1 -> A2, A2 -> A1, B -> A1, or B -> A2 (basically any
case where an Alpha is receiving packets), I get very BAD packet loss.

In the bad case (the only one I need help with):

I push from any given machine to an Alpha (either one) 100,000 1024byte
UDP packets as fast as I can send them. The Alpha will receive the
packets for about 1 second and then drop bout to a hundred in a row.
Then receive fine for about 1 second and drop about 100 in a row. This
wil happen for as long as I continue up the stream of packets. IF I
sent them sufficiently slow (10 packet bursts with 1 ms delay between
bursts) I can achieve no packet loss; however, this is not sufficient
thouput/latency for my application.

My gut tells me this is alpha architecture specific and has little or
nothing to do with the 3com cards (as I have never seen this problem on
other architectures). Perhaps it is a pci or interrupt timing issue? I
don't have any expertise in that area and these are production machines
(though hobbling at this point), so I can't goof around too much.

I have read the patches relating to Alpha past 2.2.14, but don't see
anything that would effect this behaviour.

Anyone have any ideas about this? I would be happy to supply more
information.

Any help at all would be appreciated.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E  2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA  3D 90 B9 9F BE 27 24 E7

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jul 15 2000 - 21:00:22 EST