Re: [PATCH v3 2/3] perf: Add driver for Arm NI-700 interconnect PMU
From: Will Deacon
Date: Wed Sep 04 2024 - 08:24:12 EST
On Mon, Sep 02, 2024 at 07:47:18PM +0100, Robin Murphy wrote:
> On 02/09/2024 3:47 pm, Will Deacon wrote:
> > > +static ssize_t arm_ni_format_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + struct arm_ni_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
> > > + int lo = __ffs(fmt->field), hi = __fls(fmt->field);
> > > +
> > > + return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
> > > +}
> >
> > Nit: if you end up adding single-bit config fields in the future, this
> > will quietly do the wrong thing. Maybe safe-guard the 'lo==hi' case (even
> > if you just warn once and return without doing anything).
>
> The counter-argument is that I don't foresee having any reason to add
> single-bit config fields here in future, nor indeed config1 or config2
> fields, so I intentionally pruned the would-be dead code while copy-pasting
> this implementation from arm-cmn. Yes, if someone were to make an incomplete
> change without paying attention or testing they could introduce a bug, but
> when is that ever not true?
I guess I'm just a little more wary when it comes to UAPI. Somebody starts
relying on the broken message and then you're toast. It's also incredibly
easy to avoid by construction and the dead code isn't hurting anybody.
> > > + name = devm_kasprintf(ni->dev, GFP_KERNEL, "arm_ni_%d_cd_%d", ni->id, cd->id);
> > > + if (!name)
> > > + return -ENOMEM;
> > > +
> > > + err = cpuhp_state_add_instance(arm_ni_hp_state, &cd->cpuhp_node);
> > > + if (err)
> > > + return err;
> >
> > What happens if there's a CPU hotplug operation here? Can we end up calling
> > perf_pmu_migrate_context() concurrently with perf_pmu_register()?
>
> Yes. Alternatively we could register the PMU before the hotplug handler,
> then potentially miss a hotplug event and leave a user-visible PMU
> associated with an invalid CPU. This is a known issue for all system PMU
> drivers, and the conclusion 5 years ago was that it's impractical to close
> this race from outside perf core itself[1][2].
Ok, I'm going to walk right into the trap you've set me...
Why can't we prevent hotplug (e.g. with cpus_read_lock()) while we're
setting this up?
... and climbing back out of that trap, is the conversation you had with
Thomas written down anywhere?
I don't want to block this patch, but if five years has passed with
nobody looking at this then we probably need to address that at some
point before adding more and more broken drivers.
Will