Re: [PATCH v3] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512

From: Russell King (Oracle)
Date: Wed Mar 13 2024 - 14:33:38 EST


On Wed, Mar 13, 2024 at 05:22:33PM +0100, Marek Szyprowski wrote:
> On 13.03.2024 15:35, Sudeep Holla wrote:
> > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote:
> >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote:
> >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote:
> >>>
> >>>> This could be an issue in the ARM64 arch code itself where there maybe
> >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU
> >>>> cpus and not only nr_cpu_ids as OFFSTACK does.
> >>>>
> >>>> How can I exercise the opp driver in order to recreate the problem?
> >>>>
> >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if
> >>>> there is an issue with OFFSTACK in opp then it should fail with kernel
> >>>> default configuration on that platform.
> >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine.
> >>>
> >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests.
> >>>
> >>> No warnings in the kernel log during those tests.
> >>>
> >>> How to reproduce this?
> >> I guess you need a platform with a dts that has an "operating-points-v2"
> >> property. I don't have any around.
> >>
> >> Sudeep was trying to trigger this code path earlier, not sure where he
> >> got to.
> > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy
> > clock provider to successfully probe this driver. I couldn't hit the issue
> > reported 🙁. It could be that with the hardware clock/regulator drivers, it
> > take a different path in OPP core.
>
> I can fully reproduce this issue on Khadas VIM3 and Odroid-N2 boards.
> Both Meson A311D SoC based.

So, if I'm reading the OPP code and the DTS* files for Khadas VIM3
correctly, these use operating-points-v2, which is parsed by the opp
layer.

If the opp layer is unable to parse any operating points, it should
print "no supported OPPs" and remove the table (thereby preventing
the code in question being reached.)

So, I wonder whether what you're seeing is a latent bug which is
being tickled by the presence of the CPU masks being off-stack
changing the kernel timing.

I would suggest the printk debug approach may help here to see when
the OPPs are begun to be parsed, when they're created etc and their
timing relationship to being used. Given the suspicion, it's possible
that the mere addition of printk() may "fix" the problem, which again
would be another semi-useful data point.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!