Re: [PATCH v3 0/6] percpu: partial chunk depopulation

From: Roman Gushchin
Date: Fri Apr 16 2021 - 13:13:36 EST


On Fri, Apr 16, 2021 at 08:58:33PM +0530, Pratik Sampat wrote:
> Hello Dennis,
>
> I apologize for the clutter of logs before, I'm pasting the logs of before and
> after the percpu test in the case of the patchset being applied on 5.12-rc6 and
> the vanilla kernel 5.12-rc6.
>
> On 16/04/21 7:48 pm, Dennis Zhou wrote:
> > Hello,
> >
> > On Fri, Apr 16, 2021 at 06:26:15PM +0530, Pratik Sampat wrote:
> > > Hello Roman,
> > >
> > > I've tried the v3 patch series on a POWER9 and an x86 KVM setup.
> > >
> > > My results of the percpu_test are as follows:
> > > Intel KVM 4CPU:4G
> > > Vanilla 5.12-rc6
> > > # ./percpu_test.sh
> > > Percpu:             1952 kB
> > > Percpu:           219648 kB
> > > Percpu:           219648 kB
> > >
> > > 5.12-rc6 + with patchset applied
> > > # ./percpu_test.sh
> > > Percpu:             2080 kB
> > > Percpu:           219712 kB
> > > Percpu:            72672 kB
> > >
> > > I'm able to see improvement comparable to that of what you're see too.
> > >
> > > However, on POWERPC I'm unable to reproduce these improvements with the patchset in the same configuration
> > >
> > > POWER9 KVM 4CPU:4G
> > > Vanilla 5.12-rc6
> > > # ./percpu_test.sh
> > > Percpu:             5888 kB
> > > Percpu:           118272 kB
> > > Percpu:           118272 kB
> > >
> > > 5.12-rc6 + with patchset applied
> > > # ./percpu_test.sh
> > > Percpu:             6144 kB
> > > Percpu:           119040 kB
> > > Percpu:           119040 kB
> > >
> > > I'm wondering if there's any architectural specific code that needs plumbing
> > > here?
> > >
> > There shouldn't be. Can you send me the percpu_stats debug output before
> > and after?
>
> I'll paste the whole debug stats before and after here.
> 5.12-rc6 + patchset
> -----BEFORE-----
> Percpu Memory Statistics
> Allocation Info:


Hm, this looks highly suspicious. Here is your stats in a more compact form:

Vanilla

nr_alloc : 9038 nr_alloc : 97046
nr_dealloc : 6992 nr_dealloc : 94237
nr_cur_alloc : 2046 nr_cur_alloc : 2809
nr_max_alloc : 2178 nr_max_alloc : 90054
nr_chunks : 3 nr_chunks : 11
nr_max_chunks : 3 nr_max_chunks : 47
min_alloc_size : 4 min_alloc_size : 4
max_alloc_size : 1072 max_alloc_size : 1072
empty_pop_pages : 5 empty_pop_pages : 29


Patched

nr_alloc : 9040 nr_alloc : 97048
nr_dealloc : 6994 nr_dealloc : 95002
nr_cur_alloc : 2046 nr_cur_alloc : 2046
nr_max_alloc : 2208 nr_max_alloc : 90054
nr_chunks : 3 nr_chunks : 48
nr_max_chunks : 3 nr_max_chunks : 48
min_alloc_size : 4 min_alloc_size : 4
max_alloc_size : 1072 max_alloc_size : 1072
empty_pop_pages : 12 empty_pop_pages : 61


So it looks like the number of chunks got bigger, as well as the number of
empty_pop_pages? This contradicts to what you wrote, so can you, please, make
sure that the data is correct and we're not messing two cases?

So it looks like for some reason sidelined (depopulated) chunks are not getting
freed completely. But I struggle to explain why the initial empty_pop_pages is
bigger with the same amount of chunks.

So, can you, please, apply the following patch and provide an updated statistics?

--