Re: [PATCH 1/4] mm/vmalloc: allow to call vfree() in atomic context

From: Michal Hocko
Date: Wed Apr 05 2017 - 06:46:18 EST


On Wed 05-04-17 13:31:23, Andrey Ryabinin wrote:
> On 04/04/2017 12:41 PM, Michal Hocko wrote:
> > On Thu 30-03-17 17:48:39, Andrey Ryabinin wrote:
> >> From: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
> >> Subject: mm/vmalloc: allow to call vfree() in atomic context fix
> >>
> >> Don't spawn worker if we already purging.
> >>
> >> Signed-off-by: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
> >
> > I would rather put this into a separate patch. Ideally with some numners
> > as this is an optimization...
> >
>
> It's quite simple optimization and don't think that this deserves to
> be a separate patch.

I disagree. I am pretty sure nobody will remember after few years. I
do not want to push too hard on this but I can tell you from my own
experience that we used to do way too many optimizations like that in
the past and they tend to be real head scratchers these days. Moreover
people just tend to build on top of them without understadning and then
chances are quite high that they are no longer relevant anymore.

> But I did some measurements though. With enabled VMAP_STACK=y and
> NR_CACHED_STACK changed to 0 running fork() 100000 times gives this:
>
> With optimization:
>
> ~ # grep try_purge /proc/kallsyms
> ffffffff811d0dd0 t try_purge_vmap_area_lazy
> ~ # perf stat --repeat 10 -ae workqueue:workqueue_queue_work --filter 'function == 0xffffffff811d0dd0' ./fork
>
> Performance counter stats for 'system wide' (10 runs):
>
> 15 workqueue:workqueue_queue_work ( +- 0.88% )
>
> 1.615368474 seconds time elapsed ( +- 0.41% )
>
>
> Without optimization:
> ~ # grep try_purge /proc/kallsyms
> ffffffff811d0dd0 t try_purge_vmap_area_lazy
> ~ # perf stat --repeat 10 -ae workqueue:workqueue_queue_work --filter 'function == 0xffffffff811d0dd0' ./fork
>
> Performance counter stats for 'system wide' (10 runs):
>
> 30 workqueue:workqueue_queue_work ( +- 1.31% )
>
> 1.613231060 seconds time elapsed ( +- 0.38% )
>
>
> So there is no measurable difference on the test itself, but we queue
> twice more jobs without this optimization. It should decrease load of
> kworkers.

And this is really valueable for the changelog!

Thanks!
--
Michal Hocko
SUSE Labs