Re: [PATCH 2/3] writeback: Record if the congestion was unnecessary

From: Mel Gorman
Date: Thu Aug 26 2010 - 16:31:55 EST


On Thu, Aug 26, 2010 at 08:29:04PM +0200, Johannes Weiner wrote:
> On Thu, Aug 26, 2010 at 04:14:15PM +0100, Mel Gorman wrote:
> > If congestion_wait() is called when there is no congestion, the caller
> > will wait for the full timeout. This can cause unreasonable and
> > unnecessary stalls. There are a number of potential modifications that
> > could be made to wake sleepers but this patch measures how serious the
> > problem is. It keeps count of how many congested BDIs there are. If
> > congestion_wait() is called with no BDIs congested, the tracepoint will
> > record that the wait was unnecessary.
>
> I am not convinced that unnecessary is the right word. On a workload
> without any IO (i.e. no congestion_wait() necessary, ever), I noticed
> the VM regressing both in time and in reclaiming the right pages when
> simply removing congestion_wait() from the direct reclaim paths (the
> one in __alloc_pages_slowpath and the other one in
> do_try_to_free_pages).
>
> So just being stupid and waiting for the timeout in direct reclaim
> while kswapd can make progress seemed to do a better job for that
> load.
>
> I can not exactly pinpoint the reason for that behaviour, it would be
> nice if somebody had an idea.
>

There is a possibility that the behaviour in that case was due to flusher
threads doing the writes rather than direct reclaim queueing pages for IO
in an inefficient manner. So the stall is stupid but happens to work out
well because flusher threads get the chance to do work.

> So personally I think it's a good idea to get an insight on the use of
> congestion_wait() [patch 1] but I don't agree with changing its
> behaviour just yet, or judging its usefulness solely on whether it
> correctly waits for bdi congestion.
>

Unfortunately, I strongly suspect that some of the desktop stalls seen during
IO (one of which involved no writes) were due to calling congestion_wait
and waiting the full timeout where no writes are going on.

It gets potentially worse too. Lets say we have a system with many BDIs of
different speed - e.g. SSD on one end of the spectrum and USB flash drive
on the other. The congestion for writes could be on the USB flash drive but
due to low memory, the allocator, direct reclaimers and kswapd go to sleep
periodically on congestion_wait for USB even though the bulk of the pages
need reclaiming are backed by an SSD.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/