Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages

From: Mel Gorman
Date: Thu Jun 11 2015 - 11:25:35 EST


On Thu, Jun 11, 2015 at 05:02:51PM +0200, Ingo Molnar wrote:
>
> * Mel Gorman <mgorman@xxxxxxx> wrote:
>
> > > In the full-flushing case (v6 without patch 4) the batching limit is
> > > 'infinite', we'll batch as long as possible, right?
> >
> > No because we must flush before pages are freed so the maximum batching is
> > related to SWAP_CLUSTER_MAX. If we free a page before the flush then in theory
> > the page can be reallocated and a stale TLB entry can allow access to unrelated
> > data. It would be almost impossible to trigger corruption this way but it's a
> > concern.
>
> Well, could we say double SWAP_CLUSTER_MAX to further reduce the IPI rate?
>

We could but it's a suprisingly subtle change. The impacts I can think
of are;

1. LRU lock hold times increase slightly because more pages are being
isolated
2. There are slight timing changes due to more pages having to be
processed before they are freed. There is a slight risk that more
pages than are necessary get reclaimed but I doubt it'll be
measurable
3. There is a risk that too_many_isolated checks will be easier to
trigger resulting in a HZ/10 stall
4. The rotation rate of active->inactive is slightly faster but there
should be fewer rotations before the lists get balanced so it
shouldn't matter.
5. More pages are reclaimed in a single pass if zone_reclaim_mode is
active but that thing sucks hard when it's enabled no matter what
6. More pages are isolated for compaction so page hold times there
are longer while they are being copied

There might be others. To be honest, I'm struggling to think of any serious
problems such a change would cause. The biggest risk is issue 3 but I expect
that hitting that requires that the system is already getting badly hammered.
The main downside is that it affects all page reclaim activity, not just
the mapped pages which are triggering the IPIs. I'll add a patch to the
series that alters SWAP_CLUSTER_MAX with the intent to further reduce
IPIs and see what falls out and see if any other VM person complains.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/