Re: statfs() / statvfs() syscall ballsup...

From: Linus Torvalds
Date: Sun Oct 12 2003 - 11:16:42 EST



On 12 Oct 2003, Greg Stark wrote:
>
> There are other reasons databases want to control their own cache. The
> application knows more about the usage and the future usage of the data than
> the kernel does.

But this again is not an argument for not using the page cache - it's only
an argument for _telling_ the kernel about its use.

> However on busy servers whenever it's run it causes lots of pain because the
> kernel flushes all the cached data in favour of the data this job touches.

Yes. But this is actually pretty easy to avoid in-kernel, since all of the
LRU logic is pretty localized.

It could be done on a per-process thing ("this process should not pollute
the active list") or on a per-fd thing ("accesses through this particular
open are not to pollute the active list").

> And
> worse, there's no way to indicate that the i/o it's doing is lower priority,
> so i/o bound servers get hit dramatically.

IO priorities are pretty much worthless. It doesn't _matter_ if other
processes get preferred treatment - what is costly is the latency cost of
seeking. What you want is not priorities, but batching.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/