Re: Updated Equalize patch

From: Patrick McHardy (kaber@trash.net)
Date: Fri Mar 22 2002 - 11:35:26 EST


Hi.

Unfortunately i overlooked one read_lock(..), here's the corrected patch.

> I'll try out your updated patch later, but can you tell me if it
> works without warnings or errors for you? What is the maximum throughput
> you get?

What kind of warnings/errors do you mean ?
I haven't tested it extensively yet, but the first results look very
promising,
equalizing traffic over two links (256kbit + 128kbit) resulted in a total
bandwidth
of 363kbit with one connection, although it didn't work right until i started
a ping
in the background. Also my second gateway was marked "dead" but this is
non-related i guess. I'll do some more testing this weekend and let you know.

Bye, Patrick

diff -urN linux-2.4.18-clean/Documentation/networking/load-balancing.txt linux-2.4.18/Documentation/networking/load-balancing.txt
--- linux-2.4.18-clean/Documentation/networking/load-balancing.txt Thu Jan 1 01:00:00 1970
+++ linux-2.4.18/Documentation/networking/load-balancing.txt Fri Mar 22 17:21:21 2002
@@ -0,0 +1,125 @@
+Load balancing using multipaths (patch version: 5)
+==================================================
+
+Contact Guus Sliepen <sliepen@phys.uu.nl> if you need help, want to know
+more, have remarks or further idea's with relation to this.
+
+Intro
+-----
+
+If you have multiple physical network links to another computer, and you want
+some kind of load balancing, you can now do so. Please note that this only
+applies to IPv4 traffic, not for IPX, IPv6 or any other protocol (yet).
+
+Needed
+-----
+
+* LATEST iproute package from ftp://ftp.inr.ac.ru/ip-routing/
+* CONFIG_IP_ROUTE_MULTIPATH enabled in kernel configuration (it's in
+ Networking options, below the Advanced Router option you'll have to enable
+ too)
+* Ofcourse you must also have patched your kernel and recompiled it for this
+ feature to be enabled.
+
+To do
+-----
+
+* Make sure the devices you want to combine are up, they all accept the
+ packets you want to send (ie, they must all have the same IP address/netmask
+ or something clever to get the same result)
+* Just to make sure, remove any routes via those devices (route del ...)
+* Now add all routes via one iproute command using the 'nexthops' statement:
+
+ ip route add <destaddress>/<netmask> equalize \\
+ nexthop dev <first device> \\
+ nexthop dev <second device> \\
+ nexthop ...
+
+* Just to make sure, flush route cache:
+
+ echo 1 >/proc/sys/net/ipv4/route/flush
+
+Example
+-------
+
+This is an example showing how to make a 20 Mbit connection between two
+computers using 2 10 Mbit ethernet cards per computer. Computer 1 has IP
+192.168.1.1 and computer 2 has IP 192.168.1.2. We start from scratch:
+
+[computer1]~/>ifconfig eth0 192.168.1.1 netmask 255.255.255.0
+[computer1]~/>route del -net 192.168.1.0 netmask 255.255.255.0
+[computer1]~/>ifconfig eth1 192.168.1.1 netmask 255.255.255.0
+[computer1]~/>route del -net 192.168.1.0 netmask 255.255.255.0
+[computer1]~/>ip route add 192.168.1.0/24 equalize nexthop dev eth0 nexthop dev eth1
+[computer1]~/>echo 1 >/proc/sys/net/ipv4/route/flush
+
+[computer2]~/>ifconfig eth0 192.168.1.2 netmask 255.255.255.0
+[computer2]~/>route del -net 192.168.1.0 netmask 255.255.255.0
+[computer2]~/>ifconfig eth1 192.168.1.2 netmask 255.255.255.0
+[computer2]~/>route del -net 192.168.1.0 netmask 255.255.255.0
+[computer2]~/>ip route add 192.168.1.0/24 equalize nexthop dev eth0 nexthop dev eth1
+[computer2]~/>echo 1 >/proc/sys/net/ipv4/route/flush
+
+You can even add more computers, just replace the x in 192.168.1.x with the
+number of your computer, and make sure all eth0's are connected to each other
+and all eth1's. You can also use more devices, just ifconfig them all and
+remove the default routes that are generated, and add extra nexthops.
+
+Notes
+-----
+
+If you want to add a gateway entry in your routing table, and want it to be
+balanced too, you first have to make singlepath entries for every network
+interface you want to use, after that add the gateway with all the nexthops
+filled in, then delete the singlepath routes and then add the normal
+multipath route.
+
+Older patch versions used a /proc entry to control load-balancing. This does
+not work anymore. You should use the 'equalize' flag instead while adding new
+routes. You need a fresh version of iproute for that.
+
+Status
+------
+
+Packet type: Balanced? Note
+----------------------------------------------------------------------
+ARP no But we don't want them to ;)
+ICMP yes
+Connectionless UDP yes
+Connected UDP yes
+Broadcast UDP no Would be nice if it would,
+ but this is rarely used for
+ high bandwith data transfers.
+TCP yes At least all data packets are,
+ maybe some control packets are
+ not.
+
+(Known) Bugs
+------------
+
+Due to the nature of the patch, every packet that follows a multipath uses
+a little memory that is not instantly cleaned up, but after a short period.
+This means that if your load gets higher, memory useage is higher. Since
+there is a limit to the memory that can be allocated for the packets, there
+is also a load limit. I cannot give exact numbers, however this patch does
+work with a load of 20 Mbit/s without problems on a 486 dx2 66, but not
+with a load of 400 Mbit/s on a box with multiple 400 Mhz Xeon processors.
+If the load gets too high, no memory is left for network IO, which stops
+for a while if that happens. The kernel should not crash if this happens.
+
+Technically
+-----------
+
+Load balancing needed a slight adjustment to the unpatched linux kernel,
+because of the route cache. Multipath is an option already found in the old
+2.1.x kernels. However, once a packet arrives, and it matches a multipath
+route, a (quasi random) device out of the list of nexthops is taken for its
+destination. That's okay, but after that the kernel puts everything into a
+hash table, and the next time a packet with the same source/dest/tos arrives,
+it finds it is in the hash table, and routes it via the same device as last
+time. The adjustment I made is as follows: If the kernel sees that the route
+to be taken has got the 'equalize' flag set, it not only selects the random
+device, but also tags the packet with the RTCF_EQUALIZE flag. If another
+packet of the same kind arrives, it is looked up in the hash table. It then
+checks if our flag is set, and if so, it deletes the entry in the cache and
+has to recalculate the destination again.
diff -urN linux-2.4.18-clean/include/linux/in_route.h linux-2.4.18/include/linux/in_route.h
--- linux-2.4.18-clean/include/linux/in_route.h Fri Jun 12 07:52:33 1998
+++ linux-2.4.18/include/linux/in_route.h Fri Mar 22 17:21:21 2002
@@ -18,6 +18,7 @@
 #define RTCF_MASQ 0x00400000
 #define RTCF_SNAT 0x00800000
 #define RTCF_DOREDIRECT 0x01000000
+#define RTCF_EQUALIZE 0x02000000
 #define RTCF_DIRECTSRC 0x04000000
 #define RTCF_DNAT 0x08000000
 #define RTCF_BROADCAST 0x10000000
diff -urN linux-2.4.18-clean/net/ipv4/fib_semantics.c linux-2.4.18/net/ipv4/fib_semantics.c
--- linux-2.4.18-clean/net/ipv4/fib_semantics.c Mon Feb 25 20:38:14 2002
+++ linux-2.4.18/net/ipv4/fib_semantics.c Fri Mar 22 17:21:21 2002
@@ -101,6 +101,10 @@
 };
 
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+unsigned int mp_counter=0;
+#endif
+
 /* Release a nexthop info record */
 
 void free_fib_info(struct fib_info *fi)
@@ -955,7 +959,7 @@
            it is pretty bad approximation.
          */
 
- w = jiffies % fi->fib_power;
+ w = mp_counter++ % fi->fib_power;
 
         change_nexthops(fi) {
                 if (!(nh->nh_flags&RTNH_F_DEAD) && nh->nh_power) {
diff -urN linux-2.4.18-clean/net/ipv4/ip_output.c linux-2.4.18/net/ipv4/ip_output.c
--- linux-2.4.18-clean/net/ipv4/ip_output.c Wed Oct 17 23:16:39 2001
+++ linux-2.4.18/net/ipv4/ip_output.c Fri Mar 22 17:21:21 2002
@@ -354,7 +354,7 @@
 
         /* Make sure we can route this packet. */
         rt = (struct rtable *)__sk_dst_check(sk, 0);
- if (rt == NULL) {
+ if (rt == NULL || rt->u.dst.obsolete || rt->rt_flags&RTCF_EQUALIZE) {
                 u32 daddr;
 
                 /* Use correct destination address if we have options. */
diff -urN linux-2.4.18-clean/net/ipv4/route.c linux-2.4.18/net/ipv4/route.c
--- linux-2.4.18-clean/net/ipv4/route.c Mon Feb 25 20:38:14 2002
+++ linux-2.4.18/net/ipv4/route.c Fri Mar 22 17:23:03 2002
@@ -1419,8 +1419,11 @@
                 goto martian_destination;
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
- if (res.fi->fib_nhs > 1 && key.oif == 0)
+ if (res.fi->fib_nhs > 1 && key.oif == 0) {
                 fib_select_multipath(&key, &res);
+ if (res.fi->fib_flags&RTM_F_EQUALIZE)
+ flags |= RTCF_EQUALIZE;
+ }
 #endif
         out_dev = in_dev_get(FIB_RES_DEV(res));
         if (out_dev == NULL) {
@@ -1622,15 +1625,15 @@
 int ip_route_input(struct sk_buff *skb, u32 daddr, u32 saddr,
                    u8 tos, struct net_device *dev)
 {
- struct rtable * rth;
+ struct rtable * rth, **rthp;
         unsigned hash;
         int iif = dev->ifindex;
 
         tos &= IPTOS_RT_MASK;
         hash = rt_hash_code(daddr, saddr ^ (iif << 5), tos);
 
- read_lock(&rt_hash_table[hash].lock);
- for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) {
+ write_lock(&rt_hash_table[hash].lock);
+ for (rthp=&rt_hash_table[hash].chain; (rth=*rthp); rthp=&rth->u.rt_next) {
                 if (rth->key.dst == daddr &&
                     rth->key.src == saddr &&
                     rth->key.iif == iif &&
@@ -1639,16 +1642,22 @@
                     rth->key.fwmark == skb->nfmark &&
 #endif
                     rth->key.tos == tos) {
+ if (rth->rt_flags&RTCF_EQUALIZE) {
+ *rthp = rth->u.rt_next;
+ rth->u.rt_next = NULL;
+ rt_free(rth);
+ break;
+ }
                         rth->u.dst.lastuse = jiffies;
                         dst_hold(&rth->u.dst);
                         rth->u.dst.__use++;
                         rt_cache_stat[smp_processor_id()].in_hit++;
- read_unlock(&rt_hash_table[hash].lock);
+ write_unlock(&rt_hash_table[hash].lock);
                         skb->dst = (struct dst_entry*)rth;
                         return 0;
                 }
         }
- read_unlock(&rt_hash_table[hash].lock);
+ write_unlock(&rt_hash_table[hash].lock);
 
         /* Multicast recognition logic is moved from route cache to here.
            The problem was that too many Ethernet cards have broken/missing
@@ -1852,8 +1861,11 @@
         }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
- if (res.fi->fib_nhs > 1 && key.oif == 0)
+ if (res.fi->fib_nhs > 1 && key.oif == 0) {
                 fib_select_multipath(&key, &res);
+ if (res.fi->fib_flags&RTM_F_EQUALIZE)
+ flags |= RTCF_EQUALIZE;
+ }
         else
 #endif
         if (!res.prefixlen && res.type == RTN_UNICAST && !key.oif)
@@ -1984,12 +1996,12 @@
 int ip_route_output_key(struct rtable **rp, const struct rt_key *key)
 {
         unsigned hash;
- struct rtable *rth;
+ struct rtable *rth, **rthp;
 
         hash = rt_hash_code(key->dst, key->src ^ (key->oif << 5), key->tos);
 
- read_lock_bh(&rt_hash_table[hash].lock);
- for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.rt_next) {
+ write_lock_bh(&rt_hash_table[hash].lock);
+ for (rthp=&rt_hash_table[hash].chain; (rth=*rthp); rthp=&rth->u.rt_next) {
                 if (rth->key.dst == key->dst &&
                     rth->key.src == key->src &&
                     rth->key.iif == 0 &&
@@ -1999,16 +2011,22 @@
 #endif
                     !((rth->key.tos ^ key->tos) &
                             (IPTOS_RT_MASK | RTO_ONLINK))) {
+ if (rth->rt_flags&RTCF_EQUALIZE) {
+ *rthp = rth->u.rt_next;
+ rth->u.rt_next = NULL;
+ rt_free(rth);
+ break;
+ }
                         rth->u.dst.lastuse = jiffies;
                         dst_hold(&rth->u.dst);
                         rth->u.dst.__use++;
                         rt_cache_stat[smp_processor_id()].out_hit++;
- read_unlock_bh(&rt_hash_table[hash].lock);
+ write_unlock_bh(&rt_hash_table[hash].lock);
                         *rp = rth;
                         return 0;
                 }
         }
- read_unlock_bh(&rt_hash_table[hash].lock);
+ write_unlock_bh(&rt_hash_table[hash].lock);
 
         return ip_route_output_slow(rp, key);
 }
diff -urN linux-2.4.18-clean/net/ipv4/udp.c linux-2.4.18/net/ipv4/udp.c
--- linux-2.4.18-clean/net/ipv4/udp.c Mon Feb 25 20:38:14 2002
+++ linux-2.4.18/net/ipv4/udp.c Fri Mar 22 17:21:21 2002
@@ -740,6 +740,14 @@
         sk->state = TCP_ESTABLISHED;
         sk->protinfo.af_inet.id = jiffies;
 
+ #ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if(rt->rt_flags&RTCF_EQUALIZE) {
+ ip_rt_put(rt);
+ sk->dst_cache=NULL;
+ }
+ else
+ #endif
+
         sk_dst_set(sk, &rt->u.dst);
         return(0);
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 23 2002 - 22:00:28 EST