Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
From: Kirill A. Shutemov
Date: Tue Jul 25 2017 - 11:18:06 EST
On Tue, Jul 25, 2017 at 04:26:26PM +0200, Michal Hocko wrote:
> On Mon 24-07-17 18:11:46, Michal Hocko wrote:
> > On Mon 24-07-17 17:51:42, Kirill A. Shutemov wrote:
> > > On Mon, Jul 24, 2017 at 04:15:26PM +0200, Michal Hocko wrote:
> > [...]
> > > > What kind of scalability implication you have in mind? There is
> > > > basically a zero contention on the mmap_sem that late in the exit path
> > > > so this should be pretty much a fast path of the down_write. I agree it
> > > > is not 0 cost but the cost of the address space freeing should basically
> > > > make it a noise.
> > >
> > > Even in fast path case, it adds two atomic operation per-process. If the
> > > cache line is not exclusive to the core by the time of exit(2) it can be
> > > noticible.
> > >
> > > ... but I guess it's not very hot scenario.
> > >
> > > I guess I'm just too cautious here. :)
> >
> > I definitely did not want to handwave your concern. I just think we can
> > rule out the slow path and didn't think about the fast path overhead.
> >
> > > > > Should we do performance/scalability evaluation of the patch before
> > > > > getting it applied?
> > > >
> > > > What kind of test(s) would you be interested in?
> > >
> > > Can we at lest check that number of /bin/true we can spawn per second
> > > wouldn't be harmed by the patch? ;)
> >
> > OK, so measuring a single /bin/true doesn't tell anything so I've done
> > root@test1:~# cat a.sh
> > #!/bin/sh
> >
> > NR=$1
> > for i in $(seq $NR)
> > do
> > /bin/true
> > done
>
> I wanted to reduce a potential shell side effects so I've come with a
> simple program which forks and saves the timestamp before child exit and
> right after waitpid (see attached) and then measured it 100k times. Sure
> this still measures waitpid overhead and the signal delivery but this
> should be more or less constant on an idle system, right? See attached.
>
> before the patch
> min: 306300.00 max: 6731916.00 avg: 437962.07 std: 92898.30 nr: 100000
>
> after
> min: 303196.00 max: 5728080.00 avg: 436081.87 std: 96165.98 nr: 100000
>
> The results are well withing noise as I would expect.
I've silightly modified your test case: replaced cpuid + rdtsc with
rdtscp. cpuid overhead is measurable in such tight loop.
3 runs before the patch:
Min. 1st Qu. Median Mean 3rd Qu. Max.
177200 205000 212900 217800 223700 2377000
172400 201700 209700 214300 220600 1343000
175700 203800 212300 217100 223000 1061000
3 runs after the patch:
Min. 1st Qu. Median Mean 3rd Qu. Max.
175900 204800 213000 216400 223600 1989000
180300 210900 219600 223600 230200 3184000
182100 212500 222000 226200 232700 1473000
The difference is still measuarble. Around 3%.
--
Kirill A. Shutemov