Re: [PATCH v4 1/3] cacheinfo: Add arch specific early level initializer

From: Sudeep Holla
Date: Wed May 31 2023 - 08:22:14 EST


On Thu, May 18, 2023 at 10:34:14AM +0100, Sudeep Holla wrote:
> On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> > On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > > Hi,
> > > >
> > > > I had posted a patchset[1] for x86 that initializes
> > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > >
> > >
> > > It is entirely clear to me if this is just a clean up or a fix to some
> > > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > > of fixes [2]
> >
> > My first patch is a bug fix. The second patch is clean up that results
> > from fixing the bug in patch 1.
> >
> > >
> > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > fetch_cache_info().
> > > >
> > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > take care of the case in which early and late cache leaves remain the same:
> > > >
> > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > >
> > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > last_level_cache_is_valid().
> > > >
> > >
> > > I think this is different issue as Prateek was just observing wrong info
> > > after cpuhotplug operations. But the patches manage the cpumap_populated
> > > state better with the patches. Can you please look at that as weel ?
> >
> > I verified that the patches from Prateek fix a different issue. I was able
> > to reproduce his issue. His patches fixes it.
> >
> > I still see my issue after applying Prateek's patches.
>
> Thanks, I thought it is different issue and good that you were able to test
> them as well. Please post a proper patch for the NULL ptr dereference you
> are hitting on x86.

Gentle ping! Are you still observing NULL ptr dereference with v6.4-rcx ?
If so, can you please post the fix as a proper patch ? Some of the patches
in v6.4-rc1 are being backported, so I prefer to have all the known issues
fixed before that happens. Sorry for the nag, but backport is the reason
I am pushing for this.

--
Regards,
Sudeep