Re: [OFFTOPIC] Re: Preparations for ZD's upcoming Apache/Linux benchmark - Why not Squid?

Dean Gaudet (dgaudet-list-linux-kernel@arctic.org)
Wed, 9 Jun 1999 13:18:57 -0700 (PDT)


On Wed, 9 Jun 1999, Ricardo Galli Granada wrote:

> Yes, the proposal should go to web server developers, not to linux-kernel.

It's funny, almost a year ago, to the day, I wrote up something I called
an "HTTP flow cache". It's similar in concept to cisco flow caching, or
to the route cache in 2.2.x, or to the dcache.

Essentially, you start with an empty cache. Cache misses fail to a
complete HTTP implementation (i.e. apache). The HTTP implementation
populates the cache with "pattern -> (header, body)" mappings. The
patterns are chosen so that a match won't violate HTTP/1.1 semantics.

A pattern is something like:

the request-url matches /this/url

and the "Host:" header is in the set {
"foobar.com"
}

and the "Accept:" header is in the set {
"image/jpeg",
"image/jpeg, image/gif",
"*/*"
}

then serve /this/file with these <fill-in-the-blank> headers

For example, a full content-negotiation algorithm is a pain in the
ass... but after you've seen the Accept* headers from a browser, and
served a response, you know how to negotiate requests from other browsers
with the same Accept* headers. So in the example above you would
continue to add elements to the set as you discovered more browsers.

The next trick is the simplification that eliminates the need to have a
copy of all those "accept sets" and such for every URL. This involves a
simple observation about how webservers are configured -- for apache
in particular... there's a hierarchy formed by the <Directory> et
al containers. And any two resources which shares the same set of
<Directory> matches can share the same patterns in the flow cache.
(Not 100% true, but you're probably not all http geeks, so it's pointless
to go into full detail :)

At any rate, these patterns can be used to handle pretty much all
standard HTTP usage. They essentially say "if a request matches one
of the patterns, then we can serve it quickly without going through the
full processing; otherwise we bail and do full processing".

This is a fine and dandy theoretical concept... but is it useful in
practice?

I don't know.

But what I do know is that you shouldn't be putting policy into the
kernel... such as mappings between extensions and content-types. So my
suggestion is that the kernel behave like the flow cache, in that it
bails to userland on any request it has never seen before.

Also, I didn't look to see if kHTTPd supports this already ... but the
Host: header is essentially part of the URL, and ignoring it is
problematic. HTTP sucks, repeat it again, HTTP sucks.

I stopped work in this direction because I came to the realisation that
nobody needs this. It's a benchmark gimick at best. As I said already,
if you're building a big website, you either scale it horizontally or
you'll be fired the first time your massive zillion cpu server fails and
your service goes down.

I should stop talking and just finish rearchitecting apache so you all can
complain about something else ;)

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/