Re: excessive CPU utilization by isolate_freepages?

From: Jim Schutt
Date: Thu Jun 28 2012 - 11:30:57 EST


On 06/28/2012 05:36 AM, Mel Gorman wrote:
On Wed, Jun 27, 2012 at 03:59:19PM -0600, Jim Schutt wrote:
Hi,

I'm running into trouble with systems going unresponsive,
and perf suggests it's excessive CPU usage by isolate_freepages().
I'm currently testing 3.5-rc4, but I think this problem may have
first shown up in 3.4. I'm only just learning how to use perf,
so I only currently have results to report for 3.5-rc4.


Out of curiosity, why do you think it showed up in 3.4? It's not
surprising as such if it did show up there but I'm wondering what you
are basing it on.

If I remember correctly, when I was testing this workload on 3.4 is
when I first saw hundreds of runable threads being reported by vmstat.
At that time I couldn't reproduce quite as reliably, and I didn't
know how to get perf to give me call chains, so I didn't follow up :(


It's not a suprise because it's also where reclaim/compaction stopped
depending on lumpy reclaim. In the past we would have reclaimed more
pages but now rely on compaction more. It's plassible that for many
parallel compactions that there would be higher CPU usage now.

<SNIP>
2012-06-27 14:00:03.219-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
75 1 0 566988 576 35664800 0 0 2 1355 21 3 1 4 95 0 0
433 1 0 964052 576 35069112 0 0 7 456359 102256 20901 2 98 0 0 0
547 3 0 820116 576 34893932 0 0 57 560507 114878 28115 3 96 0 0 0
806 2 0 606992 576 34848180 0 0 339 309668 101230 21056 2 98 0 0 0
708 1 0 529624 576 34708000 0 0 248 370886 101327 20062 2 97 0 0 0
231 5 0 504772 576 34663880 0 0 305 334824 95045 20407 2 97 1 1 0
158 6 0 1063088 576 33518536 0 0 531 847435 130696 47140 4 92 1 2 0
193 0 0 1449156 576 33035572 0 0 363 371279 94470 18955 2 96 1 1 0
266 6 0 1623512 576 32728164 0 0 77 241114 95730 15483 2 98 0 0 0
243 8 0 1629504 576 32653080 0 0 81 471018 100223 20920 3 96 0 1 0
70 11 0 1342140 576 33084020 0 0 100 925869 139876 56599 6 88 3 3 0
211 7 0 1130316 576 33470432 0 0 290 1008984 150699 74320 6 83 6 5 0
365 3 0 776736 576 34072772 0 0 182 747167 139436 67135 5 88 4 3 0
29 1 0 1528412 576 34110640 0 0 50 612181 137403 77609 4 87 6 3 0
266 5 0 1657688 576 34105696 0 0 3 258307 62879 38508 2 93 3 2 0
1159 2 0 2002256 576 33775476 0 0 19 88554 42112 14230 1 98 0 0 0


ok, so System CPU usage through the roof.


Right around 14:00 I was able to get a "perf -a -g"; here's the
beginning of what "perf report --sort symbol --call-graph fractal,5"
had to say:

#
64.86% [k] _raw_spin_lock_irqsave
|
|--97.94%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| |
| |--99.56%-- try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_slowpath
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | do_page_fault
| | page_fault
| | |
| | |--53.53%-- skb_copy_datagram_iovec
| | | tcp_recvmsg
| | | inet_recvmsg
| | | sock_recvmsg
| | | sys_recvfrom
| | | system_call_fastpath
| | | __recv
| | | |
| | | --100.00%-- (nil)
| | |
| | |--27.80%-- __pthread_create_2_1
| | | (nil)
| | |
| | --18.67%-- memcpy
| | |
| | |--57.38%-- 0x50d000005
| | |
| | |--34.52%-- 0x3b300bf271940a35
| | |
| | --8.10%-- 0x1500000000000009
| --0.44%-- [...]
--2.06%-- [...]


This looks like lock contention to me on zone->lock which
isolate_freepages takes and releases frequently. You do not describe the
exact memory layout but it's likely that there are two very large zones
with 12 CPUs each. If they all were running compaction they would pound
zone->lock pretty heavily.

Does this tell you what you need to know?
If not, let me know what you'd like to see.

# grep -H MemTotal /sys/devices/system/node/node*/meminfo
/sys/devices/system/node/node0/meminfo:Node 0 MemTotal: 25156644 kB
/sys/devices/system/node/node1/meminfo:Node 1 MemTotal: 25165824 kB


<SNIP>

The other call traces also look like they are pounding zone->lock
heavily.

Rik's patch has the potential to reduce contention by virtue of the fact
that less scanning is required. I'd be interested in hearing how much of
an impact that patch has so please test that first.

Working on that now....


If that approach does not work I'll put together a patch that either
backs off compaction on zone->lock contention.

I seem to be able to recreate this issue at will, so please
let me know what I can do to help learn what is going on.


Thanks very much for testing.


No problem. Thanks to you all for jumping on this so quickly.

FWIW, perf rocks! Thanks to all who've helped make it so great!

-- Jim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/