Re: [PATCH 0/5] mm/vmalloc.c: improve readability and rewrite vmap_area

From: Uladzislau Rezki
Date: Mon Jul 01 2019 - 06:11:36 EST


On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote:
> On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > Hi,
> >
> > This series of patches is to reduce the size of struct vmap_area.
> >
> > Since the members of struct vmap_area are not being used at the same time,
> > it is possible to reduce its size by placing several members that are not
> > used at the same time in a union.
> >
> > The first 4 patches did some preparatory work for this and improved
> > readability.
> >
> > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> >
> > More details can be obtained from the commit message.
>
> None of the commit messages talk about the motivation. Why do we want to
> add quite some code to achieve this? How much do we save? This all
> should be a part of the cover letter.
>
> > Thanks,
> >
> > Pengfei
> >
> > Pengfei Li (5):
> > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > mm/vmalloc.c: Introduce a wrapper function of
> > insert_vmap_area_augment()
> > mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> >
> > include/linux/vmalloc.h | 28 +++++---
> > mm/vmalloc.c | 144 +++++++++++++++++++++++++++-------------
> > 2 files changed, 117 insertions(+), 55 deletions(-)
> >
> > --
> > 2.21.0

> > Pengfei Li (5):
> > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > mm/vmalloc.c: Introduce a wrapper function of
> > insert_vmap_area_augment()
> > mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about
that and i have patches in my pipeline to send out but implementation is different.

I had a look at all 5 patches. What you are doing is reasonable to me, i mean when
it comes to the idea of reducing the size to L1 cache line.

I have a concern about implementation and all logic around when we can use va_start
and when it is something else. It is not optimal at least to me, from performance point
of view and complexity. All hot paths and tree traversal are affected by that.

For example running the vmalloc test driver against this series shows the following
delta:

<5.2.0-rc6+>
Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec
Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec
Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec
<5.2.0-rc6+>

<this series>
Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec
Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec
Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec
<this series>

For example, the degrade in second test is ~15%.

--
Vlad Rezki