Re: PROBLEM: kernel crashes when running xfsdump since ~6.4

From: Baoquan He
Date: Fri Jun 21 2024 - 10:03:27 EST


On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > On 2024-06-20 02:19, Nick Bowler wrote:
......
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index be2dd281ea76..18e87cafbaf2 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > static struct xarray *
> > addr_to_vb_xa(unsigned long addr)
> > {
> > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> >
> > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > }
> >
> The problem i see is about not-initializing of the:
> <snip>
> for_each_possible_cpu(i) {
> struct vmap_block_queue *vbq;
> struct vfree_deferred *p;
>
> vbq = &per_cpu(vmap_block_queue, i);
> spin_lock_init(&vbq->lock);
> INIT_LIST_HEAD(&vbq->free);
> p = &per_cpu(vfree_deferred, i);
> init_llist_head(&p->list);
> INIT_WORK(&p->wq, delayed_vfree_work);
> xa_init(&vbq->vmap_blocks);
> }
> <snip>
>
> correctly or fully. It is my bad i did not think that CPUs in a possible mask
> can be non sequential :-/
>
> nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.

I checked the generic version of setup_nr_cpu_ids(), from codes, they
are different with my understanding.

kernel/smp.c
void __init setup_nr_cpu_ids(void)
{
set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
}

include/linux/cpumask.h:
#define num_possible_cpus() cpumask_weight(cpu_possible_mask)