Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidatefix V3

From: Chris Mason
Date: Thu Nov 12 2009 - 17:01:46 EST


On Thu, Nov 12, 2009 at 03:27:48PM -0500, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 07:30:06PM +0000, Mel Gorman wrote:
> > Sorry for the long delay in posting another version. Testing is extremely
> > time-consuming and I wasn't getting to work on this as much as I'd have liked.
> >
> > Changelog since V2
> > o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
> > testing, it made latencies even worse as kswapd slept more on high-order
> > congestion causing order-0 direct reclaims.
> > o Added changes to how congestion_wait() works
> > o Added a number of new patches altering the behaviour of reclaim
> >
> > Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
> > failures. A significant number of these have been high-order GFP_ATOMIC
> > failures and while they are generally brushed away, there has been a large
> > increase in them recently and there are a number of possible areas the
> > problem could be in - core vm, page writeback and a specific driver. The
> > bugs affected by this that I am aware of are;
>
> Thanks for all the time you've spent on this one. Let me start with
> some more questions about the workload ;)
>
> So the workload is gitk reading a git repo and a program reading data
> over the network. Which part of the workload writes to disk?

Sorry for the self reply, I started digging through your data (man,
that's a lot of data ;). I took another tour through dm-crypt and
things make more sense now.

dm-crypt has two different single threaded workqueues for each dm-crypt
device. The first one is meant to deal with the actual encryption and
decryption, and the second one is meant to do the IO.

So the path for a write looks something like this:

filesystem -> crypt thread -> encrypt the data -> io thread -> disk

And the path for read looks something like this:

filesystem -> io thread -> disk -> crypt thread -> decrypt data -> FS

One thread does encryption and one thread does IO, and these threads are
shared for reads and writes. The end result is that all of the sync
reads get stuck behind any async write congestion and all of the async
writes get stuck behind any sync read congestion.

It's almost like you need to check for both sync and async congestion
before you have any hopes of a new IO making progress.

The confusing part is that dm hasn't gotten any worse in this regard
since 2.6.30 but the workload here is generating more sync reads
(hopefully from gitk and swapin) than async writes (from the low
bandwidth rsync). So in general if you were to change mm/*.c wait
for sync congestion instead of async, things should appear better.

The punch line is that the btrfs guy thinks we can solve all of this with
just one more thread. If we change dm-crypt to have a thread dedicated
to sync IO and a thread dedicated to async IO the system should smooth
out.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/