Re: Things that Longhorn seems to be doing right

From: Theodore Ts'o
Date: Thu Oct 30 2003 - 15:32:50 EST


On Thu, Oct 30, 2003 at 10:23:49PM +0300, Hans Reiser wrote:
> >Your assumption here is that the only thing that people search and
> >index on is semi-structed data.
> >
> No, my assumption is that structured data is a special case of
> semi-structured data, and should be modeled that way.

There are much more powerful ways of handling structured data (as
opposed to generalized text searches). What WinFS is specifically
addressing is searching and selected based on structured data.

> >In addition, even for text-based files, in the future, files will very
> >likely not be straight ASCII, but some kind of rich text based format
> >with formatting, unicode, etc.
> >
> Formatting does not make text table structured.

No, but it means that doing searches on formatted text is very
difficult, and should be done in userspace, not kernel space.

> You are missing my argument. I am saying that the indexes and name
> space belong in the kernel, not that the auto-indexer belongs in the kernel.

Searching and name spaces are different things. Fundamentally I
disagree with your belief that they are the same thing (and yes I've
read your whitepaper on the namesys web page). You can do much, much
more powerful select statements than makes sense to do via the
directory abstraction. (Think about arbitrary select statements,
possibly with subselect statements. That's what Microsoft is
promising in WinFS. Do you really want to support an opendir system
call where its argument is an arbitrary SQL select statement? I
didn't think so.)

There is a very, very big difference between a pathname, which is
guaranteed to be refer to a single unique file, such as might be used
in a Makefile. This is what most people consider a real namespace.
When addressing people, a passport number, or a driver's license
number, or a social security number, are all examples of a namespace.
Each one of these is guaranteed to return either no result, or a
single specific person.

In contrast, consider searching for someone who is male, between 30
and 40, is named Tom, and lived in Libertyville, Illinois sometime
between 1960 and 1970, and is married to someone named Mary who was
born in California. This might return several people, and most people
would **NOT** consider the space of all queries about people to be a
"name space". Searches are not names. They do not uniquely identify
people or objects, which is a fundamental requirement of a name.

We can create a filesystem with a directory indexed by social security
number, and another directory with hard links that indexes people's
records by driver's ID. That makes sense. But putting in sufficient
indexes so that the above query of looking for somone named Tom who is
married to someone named Mary (and this is an example where an query
optimizer would be needed) is simple, pure insanity.

> uh, all the time, if there is a namespace that lets him. How often do
> you use google? How often do you memorize the primary key of an object
> in a relational database, and use only that versus how often do you do a
> richer query?

I use google dozens of times a day. I type commands to bash hundreds
of times a day. Does that mean that bash command line parsing should
be in the kernel? Of course not!

The bottom line is that for something that happens dozens or even
hundreds of times a day, that's an argument that it *shouldn't* be
done in the kernel. Compare and contrast that with handling incoming
network packets, which can happen millions of times per hour.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/