Re: networking todo, was Re: Linux-2.4.0-test9-pre2

From: Andi Kleen (ak@suse.de)
Date: Wed Sep 20 2000 - 06:14:38 EST


On Tue, Sep 19, 2000 at 08:56:57PM -0700, David S. Miller wrote:
> Just use __cacheline_aligned instead, like I did with the
> ip_local_data in the patch you just rejected. There is still the
> problem that generic SMP x86 kernels use a 32byte cacheline. Not a
> problem currently because the only x86 SMP is pII/pIII which has
> 32byte, but with Williamette/Athlon SMP it'll be a problem because
> these have 64byte and 128byte cache lines. With __cacheline_aligned
> it'll be easy though to pick up such changes.
>
> Ok, feel free to send me a patch which does only this. I may doctor
> it up a bit, be warned :-)

Here is the complete patch I'm using, pick what you want from it.

This time I documented my intentions properly, see the source code
comments. I removed all power-of-two paddings because the shift/add
sequences seem to be always short enough.

I also documented a bigger potential win on machines with immediate
memory access and sane interrupt model (and an even bigger --
Dimitris Michailidis's PDA patches) which both need more extensive
modifications

> On the other hand, the inetpeer code is only really exercised on
> machines that talk to lots and lots of destinations (=real
> servers), and 2.4 testing on such machines still has to begin.
>
> Given that 2.4 testing has not really begun yet I would guess that
> it is safer to remove it now @)
>
> True, but the following is what I'm really thinking about. Suppose a
> few weeks from now Alexey greets both of our mailboxes with a fabulous
> solution to the tw recycle masqeurading problem, wouldn't it be quite
> a pain in the ass to put back in and retest all the inetpeer code?
>
> Let's sit on this until Alexey gives a forecast about the
> possibilities of a solution in the near future ok?

Ok. I just don't want the code uselessly wasting cycles and cache lines in
2.4.0 if possible.

-Andi

Index: include/net/snmp.h
===================================================================
RCS file: /cvs/linux/include/net/snmp.h,v
retrieving revision 1.16
diff -u -u -r1.16 snmp.h
--- include/net/snmp.h 2000/08/09 11:59:03 1.16
+++ include/net/snmp.h 2000/09/20 11:19:25
@@ -14,17 +14,34 @@
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  *
+ * $Id$
+ *
  */
  
 #ifndef _SNMP_H
 #define _SNMP_H
+
+#include <linux/cache.h>
  
 /*
  * We use all unsigned longs. Linux will soon be so reliable that even these
  * will rapidly get too small 8-). Seriously consider the IpInReceives count
  * on the 20Gb/s + networks people expect in a few years time!
  */
-
+
+/*
+ * The rule for padding:
+ * Best is power of two because then the right structure can be found by a simple
+ * shift. The structure should be always cache line aligned.
+ * gcc needs n=alignto(cachelinesize, popcnt(sizeof(bla_mib))) shift/add instructions
+ * to emulate multiply in case it is not power-of-two. Currently n is always <=3 for
+ * all sizes so simple cache line alignment is enough.
+ *
+ * The best solution would be a global CPU local area , especially on 64 and 128byte
+ * cacheline machine it makes a *lot* of sense -AK
+ */
+
+
 struct ip_mib
 {
          unsigned long IpInReceives;
@@ -44,8 +61,8 @@
          unsigned long IpFragOKs;
          unsigned long IpFragFails;
          unsigned long IpFragCreates;
- unsigned long __pad[32-19];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
  
 struct ipv6_mib
 {
@@ -71,8 +88,8 @@
          unsigned long Ip6FragCreates;
          unsigned long Ip6InMcastPkts;
          unsigned long Ip6OutMcastPkts;
- unsigned long __pad[32-22];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
  
 struct icmp_mib
 {
@@ -102,8 +119,8 @@
          unsigned long IcmpOutTimestampReps;
          unsigned long IcmpOutAddrMasks;
          unsigned long IcmpOutAddrMaskReps;
- unsigned long __pad[32-26];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
 
 struct icmpv6_mib
 {
@@ -140,8 +157,8 @@
         unsigned long Icmp6OutRedirects;
         unsigned long Icmp6OutGroupMembResponses;
         unsigned long Icmp6OutGroupMembReductions;
- unsigned long __pad[32-28];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
  
 struct tcp_mib
 {
@@ -159,8 +176,8 @@
          unsigned long TcpRetransSegs;
          unsigned long TcpInErrs;
          unsigned long TcpOutRsts;
- unsigned long __pad[16-14];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
  
 struct udp_mib
 {
@@ -168,8 +185,8 @@
          unsigned long UdpNoPorts;
          unsigned long UdpInErrors;
          unsigned long UdpOutDatagrams;
- unsigned long __pad[0];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
 
 struct linux_mib
 {
@@ -237,9 +254,15 @@
         unsigned long TCPAbortOnLinger;
         unsigned long TCPAbortFailed;
         unsigned long TCPMemoryPressures;
- unsigned long __pad[64-64];
-};
+ unsigned long __pad[0];
+} ____cacheline_aligned;
+
 
+/*
+ * FIXME: On x86 and some other CPUs the split into user and softirq parts is not needed because
+ * addl $1,memory is atomic against interrupts (but atomic_inc would be overkill because of the lock
+ * cycles). Wants new nonlocked_atomic_inc() primitives -AK
+ */
 #define SNMP_INC_STATS(mib, field) ((mib)[2*smp_processor_id()+!in_softirq()].field++)
 #define SNMP_INC_STATS_BH(mib, field) ((mib)[2*smp_processor_id()].field++)
 #define SNMP_INC_STATS_USER(mib, field) ((mib)[2*smp_processor_id()+1].field++)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:23 EST