Re: [PATCH v4 1/3] cacheinfo: Add arch specific early level initializer

From: Ricardo Neri
Date: Wed May 31 2023 - 13:05:01 EST


On Wed, May 31, 2023 at 01:22:01PM +0100, Sudeep Holla wrote:
> On Thu, May 18, 2023 at 10:34:14AM +0100, Sudeep Holla wrote:
> > On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> > > On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > > > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > > > Hi,
> > > > >
> > > > > I had posted a patchset[1] for x86 that initializes
> > > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > > >
> > > >
> > > > It is entirely clear to me if this is just a clean up or a fix to some
> > > > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > > > of fixes [2]
> > >
> > > My first patch is a bug fix. The second patch is clean up that results
> > > from fixing the bug in patch 1.
> > >
> > > >
> > > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > > fetch_cache_info().
> > > > >
> > > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > > take care of the case in which early and late cache leaves remain the same:
> > > > >
> > > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > > >
> > > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > > last_level_cache_is_valid().
> > > > >
> > > >
> > > > I think this is different issue as Prateek was just observing wrong info
> > > > after cpuhotplug operations. But the patches manage the cpumap_populated
> > > > state better with the patches. Can you please look at that as weel ?
> > >
> > > I verified that the patches from Prateek fix a different issue. I was able
> > > to reproduce his issue. His patches fixes it.
> > >
> > > I still see my issue after applying Prateek's patches.
> >
> > Thanks, I thought it is different issue and good that you were able to test
> > them as well. Please post a proper patch for the NULL ptr dereference you
> > are hitting on x86.
>
> Gentle ping! Are you still observing NULL ptr dereference with v6.4-rcx ?

Yes, I still observe it on v6.4-rc4.

> If so, can you please post the fix as a proper patch ? Some of the patches
> in v6.4-rc1 are being backported, so I prefer to have all the known issues
> fixed before that happens. Sorry for the nag, but backport is the reason
> I am pushing for this.

Sure. Sorry for the delay. I have the patch ready and post this week. I
will post it as part my previous patches in [1].

Thanks and BR,
Ricardo