Re: statfs() / statvfs() syscall ballsup...

From: Ingo Oeser
Date: Thu Oct 16 2003 - 05:32:48 EST


Hi there,

first: I think the problem is solvable with mixing blocking and
non-blocking IO or simply AIO, which will be supported nicely by 2.6.0,
is a POSIX standard and is meant for doing your own IO scheduling.

On Wednesday 15 October 2003 17:03, Greg Stark wrote:
> Ingo Oeser <ioe-lkml@xxxxxxxxxx> writes:
> > On Monday 13 October 2003 10:45, Helge Hafting wrote:
> > > This is easier than trying to tell the kernel that the job is
> > > less important, that goes wrong wether the job runs too much
> > > or too little. Let that job sleep a little when its services
> > > aren't needed, or when you need the disk bandwith elsewhere.
>
> Actually I think that's exactly backwards. The problem is that if the
> user-space tries to throttle the process it doesn't know how much or when.
> The kernel knows exactly when there are other higher priority writes, it
> can schedule just enough writes from vacuum to not interfere.

On dedicated servers this might be true. But on these you could also
solve it in user space by measuring disk bandwidth and issueing just
enough IO to keep up roughly with it.

> So if vacuum slept a bit, say every 64k of data vacuumed. It could end up
> sleeping when the disks are actually idle. Or it could be not sleeping
> enough and still be interfering with transactions.

The vacuum io is submitted (via AIO or simulation of it) normally in a
unit U and waiting ALWAYS for U to complete, before submitting a new one.
Between submitting units, the vacuums checks for outstanding transactions
and stops, when we have one.

Now a transaction is submitted and the submitting from vacuum is stopped
by it existing. The transaction waits for completion (e.g. aio_suspend())
and signals vacuum to continue.

So the disk(s) should be always in good use.

I don't know much of the design internals of your database, but this
sounds promising and is portable.

> > The questions are: How IO-intensive vacuum? How fast can a throttling
> > free disk bandwidth (and memory)?
>
> It's purely i/o bound on large sequential reads. Ideally it should still
> have large enough sequential reads to not lose the streaming advantage, but
> not so large that it preempts the more random-access transactions.

Ok, so we can ignore the processing time and the above should just work.


Regards

Ingo Oeser


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/