Re: [KORG] Re: kernel.org lies about latest -mm kernel

From: Willy Tarreau
Date: Tue Dec 19 2006 - 01:47:57 EST

Next message: Nick Piggin: "Re: 2.6.19 file content corruption on ext3"
Previous message: Benjamin LaHaise: "Re: [PATCH] procfs: export context switch counts in /proc/*/stat"
In reply to: J.H.: "Re: [KORG] Re: kernel.org lies about latest -mm kernel"
Next in thread: J.H.: "Re: [KORG] Re: kernel.org lies about latest -mm kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Dec 17, 2006 at 04:42:56PM -0800, J.H. wrote:
> On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> > On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > > J.H. wrote:
> > ...
> > > >The root cause boils down to with git, gitweb and the normal mirroring
> > > >on the frontend machines our basic working set no longer stays resident
> > > >in memory, which is forcing more and more to actively go to disk causing
> > > >a much higher I/O load. You have the added problem that one of the
> > > >frontend machines is getting hit harder than the other due to several
> > > >factors: various DNS servers not round robining, people explicitly
> > > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > > >probably several other factors we aren't aware of. This has caused the
> > > >average load on that machine to hover around 150-200 and if for whatever
> > > >reason we have to take one of the machines down the load on the
> > > >remaining machine will skyrocket to 2000+.
> >
> > Relaying on DNS and clients doing round-robin load-balancing is doomed.
> >
> > You really, REALLY, need external L4 load-balancer switches.
> > (And installation help from somebody who really knows how to do this
> > kind of services on a cluster.)
>
> While this is a really good idea when you have systems that are all in a
> single location, with a single uplink and what not - this isn't the case
> with kernel.org. Our machines are currently in three separate
> facilities in the US (spanning two different states), with us working on
> a fourth in Europe.

On multi-site setups, you have to rely on DNS, but the DNS should not
announce the servers themselves, but the local load balancers, each of
which knows other sites.

While people often find it dirty, there's no problem forwarding a
request from one site to another via the internet as long as there
are big pipes. Generally, I play with weights to slightly smooth
the load and reduce the bandwidth usage on the pipe (eg: 2/3 local,
1/3 remote).

With LVS, you can even use the tunneling mode, with which the request
comes to LB on site A, is forwarded to site B via the net, but the data
returns from site B to the client.

If the frontend machines are not taken off-line too often, it should
be no big deal for them to handle something such as LVS, and would
help spreding the load.

> > > >Since it's apparent not everyone is aware of what we are doing, I'll
> > > >mention briefly some of the bigger points.
> > ...
> > > >- We've cut back on the number of ftp and rsync users to the machines.
> > > >Basically we are cutting back where we can in an attempt to keep the
> > > >load from spiraling out of control, this helped a bit when we recently
> > > >had to take one of the machines down and instead of loads spiking into
> > > >the 2000+ range we peaked at about 500-600 I believe.
> >
> > How about having filesystems mounted with "noatime" ?
> > Or do you already do that ?
>
> We've been doing that for over a year.

Couldn't we temporarily *cut* the services one after the other on www1
to find which ones are the most I/O consumming, and see which ones can
coexist without bad interaction ?

Also, I see that keepalive is still enabled on apache, I guess there
are thousands of processes and that apache is eating gigs of RAM by
itself. I strongly suggest disabling keepalive there.

> - John

Just my 2 cents,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nick Piggin: "Re: 2.6.19 file content corruption on ext3"
Previous message: Benjamin LaHaise: "Re: [PATCH] procfs: export context switch counts in /proc/*/stat"
In reply to: J.H.: "Re: [KORG] Re: kernel.org lies about latest -mm kernel"
Next in thread: J.H.: "Re: [KORG] Re: kernel.org lies about latest -mm kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]