Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb

From: Jan Kara
Date: Mon Sep 14 2009 - 07:17:32 EST


On Thu 10-09-09 17:49:10, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 16:23 +0200, Jan Kara wrote:
> > Well, what I imagined we could do is:
> > Have a per-bdi variable 'pages_written' - that would reflect the amount of
> > pages written to the bdi since boot (OK, we'd have to handle overflows but
> > that's doable).
> >
> > There will be a per-bdi variable 'pages_waited'. When a thread should sleep
> > in balance_dirty_pages() because we are over limits, it kicks writeback thread
> > and does:
> > to_wait = max(pages_waited, pages_written) + sync_dirty_pages() (or
> > whatever number we decide)
> > pages_waited = to_wait
> > sleep until pages_written reaches to_wait or we drop below dirty limits.
> >
> > That will make sure each thread will sleep until writeback threads have done
> > their duty for the writing thread.
> >
> > If we make sure sleeping threads are properly ordered on the wait queue,
> > we could always wakeup just the first one and thus avoid the herding
> > effect. When we drop below dirty limits, we would just wakeup the whole
> > waitqueue.
> >
> > Does this sound reasonable?
>
> That seems to go wrong when there's multiple tasks waiting on the same
> bdi, you'd count each page for 1/n its weight.
>
> Suppose pages_written = 1024, and 4 tasks block and compute their to
> wait as pages_written + 256 = 1280, then we'd release all 4 of them
> after 256 pages are written, instead of 4*256, which would be
> pages_written = 2048.
Well, there's some locking needed of course. The intent is to stack
demands as they come. So in case pages_written = 1024, pages_waited = 1024
we would do:
THREAD 1:

spin_lock
to_wait = 1024 + 256
pages_waited = 1280
spin_unlock

THREAD 2:

spin_lock
to_wait = 1280 + 256
pages_waited = 1536
spin_unlock

So weight of each page will be kept. The fact that second thread
effectively waits until the first thread has its demand satisfied looks
strange at the first sight but we don't do better currently and I think
it's fine - if they were two writer threads, then soon the thread released
first will queue behind the thread still waiting so long term the behavior
should be fair.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/