Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small

From: Minchan Kim
Date: Thu Jul 21 2011 - 20:30:48 EST


On Fri, Jul 22, 2011 at 1:58 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Thu, Jul 21, 2011 at 12:42 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>> On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
>>> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>>> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>>> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>>> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>>> >> > > (Built this time and passed a basic sniff-test.)
>>> >> > >
>>> >> > > During allocator-intensive workloads, kswapd will be woken frequently
>>> >> > > causing free memory to oscillate between the high and min watermark.
>>> >> > > This is expected behaviour. ÂUnfortunately, if the highest zone is
>>> >> > > small, a problem occurs.
>>> >> > >
>>> >> > > This seems to happen most with recent sandybridge laptops but it's
>>> >> > > probably a co-incidence as some of these laptops just happen to have
>>> >> > > a small Normal zone. The reproduction case is almost always during
>>> >> > > copying large files that kswapd pegs at 100% CPU until the file is
>>> >> > > deleted or cache is dropped.
>>> >> > >
>>> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>>> >> > > awake when the highest zone is small and unreclaimable and compounded
>>> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>>> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>>> >> > >
>>> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>>> >> > > Â the classzone_idx instead of all zones.
>>> >> > >
>>> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>>> >> > >
>>> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>>> >> > > Â a high classzone which is not what allocators or balance_pgdat()
>>> >> > > Â is doing leading to an artifical believe that kswapd should be
>>> >> > > Â still awake.
>>> >> > >
>>> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>>> >> > > Â decision is not communicated to sleeping_prematurely()
>>> >> > >
>>> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
>>> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>>> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
>>> >> > > cc'd people that reported similar problems recently to see if they
>>> >> > > still suffer from the problem and if this fixes it.
>>> >> > >
>>> >> >
>>> >> > Good!
>>> >> > This patch solved the problem.
>>> >> > But there is still a mystery.
>>> >> >
>>> >> > In log, we could see excessive shrink_slab calls.
>>> >>
>>> >> Yes, because shrink_slab() was called on each loop through
>>> >> balance_pgdat() even if the zone was balanced.
>>> >>
>>> >>
>>> >> > And as you know, we had merged patch which adds cond_resched where last of the function
>>> >> > in shrink_slab. So other task should get the CPU and we should not see
>>> >> > 100% CPU of kswapd, I think.
>>> >> >
>>> >>
>>> >> cond_resched() is not a substitute for going to sleep.
>>> >
>>> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
>>> > So we should never see 100% CPU consumption of kswapd.
>>> > No?
>>>
>>> If the rest of the system is idle, then kswapd will happily use 100%
>>> CPU. Â(Or on a multi-core system, kswapd will use close to 100% of one
>>
>> Of course. But at least, we have a test program and I think it's not idle.
>
> The test program I used was 'top', which is pretty close to idle.
>
>>
>>> CPU even if another task is using the other one. ÂThis is bad enough
>>> on a desktop, but on a laptop you start to notice when your battery
>>> dies.)
>>
>> Of course it's bad. :)
>> What I want to know is just what's exact cause of 100% CPU usage.
>> It might be not 100% but we might use the word sloppily.
>>
>
> Well, if you want to pedantic, my laptop can, in theory, demonstrate
> true 100% CPU usage. ÂTrigger the bug, suspend every other thread, and
> listen to the laptop fan spin and feel the laptop get hot. Â(The fan
> is controlled by the EC and takes no CPU.)
>
> In practice, the usage was close enough to 100% that it got rounded.
>
> The cond_resched was enough to at least make the system responsive
> instead of the hard freeze I used to get.

I don't want to be pedantic. :)
What I have a thought about 100% CPU usage was that it doesn't yield
CPU and spins on the CPU but as I heard your example(ie, cond_resched
makes the system responsive), it's not the case. It was just to use
most of time in kswapd, not 100%. It seems I was paranoid about the
word, sorry for that.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/