Re: [PATCH v2 7/9] mm: vmalloc: Support multiple nodes in vread_iter

From: Baoquan He
Date: Wed Sep 13 2023 - 23:37:05 EST


On 09/13/23 at 05:42pm, Uladzislau Rezki wrote:
> On Tue, Sep 12, 2023 at 09:42:32PM +0800, Baoquan He wrote:
> > On 09/11/23 at 08:16pm, Uladzislau Rezki wrote:
> > > On Mon, Sep 11, 2023 at 11:58:13AM +0800, Baoquan He wrote:
> > > > On 08/29/23 at 10:11am, Uladzislau Rezki (Sony) wrote:
> > > > > Extend the vread_iter() to be able to perform a sequential
> > > > > reading of VAs which are spread among multiple nodes. So a
> > > > > data read over the /dev/kmem correctly reflects a vmalloc
> > > > > memory layout.
> > > > >
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > > > ---
> > > > > mm/vmalloc.c | 67 +++++++++++++++++++++++++++++++++++++++++-----------
> > > > > 1 file changed, 53 insertions(+), 14 deletions(-)
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 4fd4915c532d..968144c16237 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > ......
> > > > > @@ -4057,19 +4093,15 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
> > > > >
> > > > > remains = count;
> > > > >
> > > > > - /* Hooked to node_0 so far. */
> > > > > - vn = addr_to_node(0);
> > > > > - spin_lock(&vn->busy.lock);
> > > >
> > > > This could change the vread behaviour a little bit. Before, once we take
> > > > vmap_area_lock, the vread will read out the content of snapshot at the
> > > > moment. Now, reading out in one node's tree won't disrupt other nodes'
> > > > tree accessing. Not sure if this matters when people need access
> > > > /proc/kcore, e.g dynamic debugging.
> > > >
> > > With one big tree you anyway drop the lock after one cycle of reading.
> > > As far as i see, kcore.c's read granularity is a PAGE_SIZE.
> >
> > With my understanding, kcore reading on vmalloc does read page by page,
> > it will continue after one page reading if the required size is bigger
> > than one page. Please see aligned_vread_iter() code. During the complete
> > process, vmap_area_lock is held before this patch.
> >
> > >
> > > >
> > > > And, the reading will be a little slower because each va finding need
> > > > iterate all vmap_nodes[].
> > > >
> > > Right. It is a bit tough here, because we have multiple nodes which
> > > represent zones(address space), i.e. there is an offset between them,
> > > it means that, reading fully one tree, will not provide a sequential
> > > reading.
> >
> > Understood. Suppose the kcore reading on vmalloc is not critical. If I
> > get chance to test on a machine with 256 cpu, I will report here.
> >
> It would be great! Unfortunately i do not have an access to such big
> systems. What i have is 64 CPUs max system. If you, by chance can test
> on bigger systems or can provide a temporary ssh access that would be
> awesome.

10.16.216.205
user:root
password:redhat

This is a testing server in our lab, we apply for usage each time and it
will reinstall OS, root user should be OK. I will take it for two days.

If accessing is not available, I can do some testing if you want me to
run some commands.