Re: [PATCH v4 5/5] mm: Only IPI CPUs to drain local pages if they exist

From: Gilad Ben-Yossef
Date: Fri Dec 30 2011 - 15:16:42 EST

On Fri, Dec 30, 2011 at 5:04 PM, Mel Gorman <mgorman@xxxxxxx> wrote:
> On Sun, Dec 25, 2011 at 11:39:59AM +0200, Gilad Ben-Yossef wrote:
> CONFIG_CPUMASK_OFFSTACK is force enabled if CONFIG_MAXSMP on x86. This
> may be the case for some server-orientated distributions. I know
> SLES enables this option for x86-64 at least. Debian does not but
> might in the future. I don't know about RHEL but it should be checked.
> Either way, we cannot depend on CONFIG_CPUMASK_OFFSTACK being disabled
> (it's enabled on my laptop for example due to the .config it is based
> on). That said, breaking the link between MAXSMP and OFFSTACK may be
> an option.

Yes, I know and it is enabled for RHEL as well, I believe.
The point is, MAXSMP is enabled in the enterprise distribution in
order to support
the massively multi-core systems. Reducing cross CPU interference is important
to these very systems.

In fact, since CONFIG_CPUMASK_OFFSTACK has a price on its own, the fact
that distros enable it (via MAXSMP) is proof in my eyes that the distros find
massively multi-core systems important :-)

That being said, the patch only has value if it actually reduces cross
does not incur a bigger price, otherwise of course it should be dropped.

> > For CONFIG_CPUMASK_OFFSTACK=y but when we got to drain_all_pages from
> > the memory
> > hotplug or the memory failure code path (the code other code path that
> > call drain_all_pages),
> > there is  no inherent memory pressure, so we should be OK.
> >
> It's the memory failure code path after direct reclaim failed. How
> can you say there is no inherent memory pressure?
Bah.. you are right. Memory allocation will cause memory migration to
the remaining active memory areas, so yes, it's a memory pressure.
Point taken. My bad.

> > The thing is, if you are at CPUMASK_OFFSTACK=y, you are saying
> > that you optimize for the large number of CPU case, otherwise it doesn't
> > make sense - you can represent 32 CPU in the space it takes to
> > hold the pointer to the cpumask (on 32bit system) etc.
> >
> > If you are at CPUMASK_OFFSTACK=n you (almost) didn't pay anything.
> >
> <snip>

> It's the CPUMASK_OFFSTACK=y case I worry about as it is enabled on
> at least one server-orientated distribution and probably more.
Sure, because they care about performance (or even just plain working) on
massively multi-core systems. Something this patch set aims to get to work

> > I think of it more of as a CPU isolation feature then pure performance.
> > If you have a system with a couple of dozens of CPUs (Tilera, SGI, Cavium
> > or the various virtual NUMA folks) you tend to want to break up the system
> > into sets of CPUs that work of separate tasks.
> >
> Even with the CPUs isolated, how often is it the case that many of
> the CPUs have 0 pages on their per-cpu lists? I checked a bunch of
> random machines just there and in every case all CPUs had at least
> one page on their per-cpu list. In other words I believe that in
> many cases the exact same number of IPIs will be used but with the
> additional cost of allocating a cpumask.

A common usage scenario with systems with lots of cores is to isolate
a group of cores to run a (almost) totally CPU bound task to each CPU
of the set. Those tasks rarely call into the kernel, they just crunch numbers
and they end up have 0 per-cpu set more often then you think.

But you are right that it is a specific use case. The question is what is the
cost in other use cases.

> <snip>

> I'm still generally uncomfortable with the allocator allocating memory
> while it is known memory is tight.

Got you.

> As a way of mitigating that, I would suggest this is done in two
> passes. The first would check if at least 50% of the CPUs have no pages
> on their per-cpu list. Then and only then allocate the per-cpu mask to
> limit the IPIs. Use a separate patch that counts in /proc/vmstat how
> many times the per-cpu mask was allocated as an approximate measure of
> how often this logic really reduces the number of IPI calls in practice
> and report that number with the patch - i.e. this patch reduces the
> number of times IPIs are globally transmitted by X% for some workload.
Great idea. I like it - and I guess the 50% could be configurable.
Will do and report.

> --

> Mel Gorman
> SUSE Labs

Gilad Ben-Yossef
Chief Coffee Drinker
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at