Re: [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache

From: Mel Gorman
Date: Thu Aug 20 2015 - 09:42:50 EST


On Thu, Aug 20, 2015 at 03:18:43PM +0200, Michal Hocko wrote:
> On Wed 12-08-15 11:45:26, Mel Gorman wrote:
> [...]
> > 4-node machine stutter
> > 4-node machine stutter
> > 4.2.0-rc1 4.2.0-rc1
> > vanilla nozlc-v1r20
> > Min mmap 53.9902 ( 0.00%) 49.3629 ( 8.57%)
> > 1st-qrtle mmap 54.6776 ( 0.00%) 54.1201 ( 1.02%)
> > 2nd-qrtle mmap 54.9242 ( 0.00%) 54.5961 ( 0.60%)
> > 3rd-qrtle mmap 55.1817 ( 0.00%) 54.9338 ( 0.45%)
> > Max-90% mmap 55.3952 ( 0.00%) 55.3929 ( 0.00%)
> > Max-93% mmap 55.4766 ( 0.00%) 57.5712 ( -3.78%)
> > Max-95% mmap 55.5522 ( 0.00%) 57.8376 ( -4.11%)
> > Max-99% mmap 55.7938 ( 0.00%) 63.6180 (-14.02%)
> > Max mmap 6344.0292 ( 0.00%) 67.2477 ( 98.94%)
> > Mean mmap 57.3732 ( 0.00%) 54.5680 ( 4.89%)
>
> Do you have data for other leads? Because the reclaim counters look
> quite discouraging to be honest.
>

None of the other workloads showed changes that were worth reporting.

> > 4.1.0 4.1.0
> > vanilla nozlc-v1r4
> > Swap Ins 838 502
> > Swap Outs 1149395 2622895
>
> Twice as much swapouts is a lot.
>
> > DMA32 allocs 17839113 15863747
> > Normal allocs 129045707 137847920
> > Direct pages scanned 4070089 29046893
>
> 7x more scanns by direct reclaim also sounds bad.
>

With this benchmark, the results for stutter will be highly variable as
it's hammering the system. The intent of the test was to measure stalls at
a time when desktop interactivity went to hell during IO and could stall
for several minutes. Due to it nature, there is intense reclaim *and*
compaction activity going on and there is no point drawing conclusions
from the reclaim stats that are inherently good or bad.

There will be differences in direct reclaim figures because instead of
looping in the page allocator waiting for zlc to clear, it'll enter direct
reclaim. In effect, the zlc causes processes to busy loop while kswapd
does the work. If it turns out that this is the correct behaviour then
we should do that explicitly, not rely on the broken zlc behaviour for
the same reason we no longer rely on sprinkling congestion_wait() all
over the place.

> > Kswapd pages scanned 17147837 17140694
>
> while kswapd is doing the same amount of work so we are moving
> considerable amount of reclaim activity into the direct reclaim
>
> > Kswapd pages reclaimed 17146691 17139601
> > Direct pages reclaimed 1888879 4886630
> > Kswapd efficiency 99% 99%
> > Kswapd velocity 17523.721 17518.928
> > Direct efficiency 46% 16%
>
> which is just a wasted effort because the efficiency is really poor.
> Is this the effect of hammering a single zone which would be skipped
> otherwise while the allocation would succed from another zone?
>

Very doubtful. It's more likely because the zlc was causing a process to
busy loop waiting for kswapd to make forward progress.

> The latencies were not very much higher to match these numbers though.
> Is it possible that other parts of the benchmark suffered? The benchmark
> has measured only mmap part AFAIU.
>

mmap latency yes but during it, the system is getting hammered and the
latency is also affected by whether THPs were used or not.

> > Direct velocity 4159.306 29687.854
> > Percentage direct scans 19% 62%
> > Page writes by reclaim 1149395.000 2622895.000
> > Page writes file 0 0
> > Page writes anon 1149395 2622895
> >
> > The direct page scan and reclaim rates are noticeable. It is possible
> > this will not be a universal win on all workloads but cycling through
> > zonelists waiting for zlc->last_full_zap to expire is not the right
> > decision.
>
> As much as I would like to see zlc go it seems that it won't be that
> easy without regressing some loads. Or the numbers

If there are regressions on a real workload then it would be worth
considering why busy looping happened to behave better and then solve it
correctly.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/