Re: [PATCH v4 5/7] x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize.

From: Tony Luck
Date: Mon Aug 28 2023 - 15:02:32 EST


On Mon, Aug 28, 2023 at 10:05:50AM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 8/25/2023 10:49 AM, Tony Luck wrote:
> > On Fri, Aug 11, 2023 at 10:32:29AM -0700, Reinette Chatre wrote:
> >> On 7/22/2023 12:07 PM, Tony Luck wrote:
>
> ...
>
> >>> +static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
> >>> + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
> >>> + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
> >>> + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
> >>> + {}
> >>> +};
> >>> +
> >>> +/*
> >>> + * There isn't a simple enumeration bit to show whether SNC mode
> >>> + * is enabled. Look at the ratio of number of NUMA nodes to the
> >>> + * number of distinct L3 caches. Take care to skip memory-only nodes.
> >>> + */
> >>> +static __init int get_snc_config(void)
> >>> +{
> >>> + unsigned long *node_caches;
> >>> + int mem_only_nodes = 0;
> >>> + int cpu, node, ret;
> >>> +
> >>> + if (!x86_match_cpu(snc_cpu_ids))
> >>> + return 1;
> >>> +
> >>> + node_caches = kcalloc(BITS_TO_LONGS(nr_node_ids), sizeof(*node_caches), GFP_KERNEL);
> >>> + if (!node_caches)
> >>> + return 1;
> >>> +
> >>> + cpus_read_lock();
> >>> + for_each_node(node) {
> >>> + cpu = cpumask_first(cpumask_of_node(node));
> >>> + if (cpu < nr_cpu_ids)
> >>> + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
> >>> + else
> >>> + mem_only_nodes++;
> >>> + }
> >>> + cpus_read_unlock();
> >>
> >> I am not familiar with the numa code at all so please correct me
> >> where I am wrong. I do see that nr_node_ids is initialized with __init code
> >> so it should be accurate at this point. It looks to me like this initialization
> >> assumes that at least one CPU per node will be online at the time it is run.
> >> It is not clear to me that this assumption would always be true.
> >
> > Resctrl initialization is kicked off as a late_initcall(). So all CPUs
> > and devices are fully initialized before this code runs.
> >
> > Resctrl can't be moved to an "init" state before CPUs are brought online
> > because it makes a call to cpuhp_setup_state() to get callbacks for
> > online/offline CPU events ... that call can't be done early.
>
> Apologies but this is not so obvious to me. From what I understand a
> system need not be booted with all CPUs online. CPUs can be brought
> online at any time.

So you are concerned about the case where the system was booted with a
maxcpus=N boot argument, and additional CPUs are brought online later?

My code will fail to detect SNC mode if someone does that. I don't see
any possible way to recover from this in a safe way later when additional
CPUs are brought online. Those CPUs could be brought online in any order
by the user, and there is no way to know when they are all done. Without an
explicit enumeration of SNC mode there is no fool-proof way to detect
SNC mode in the face of CPU hot plug post-boot.

I could add a pr_warn() into this code:

915 static int __init nrcpus(char *str)
916 {
917 int nr_cpus;
918
919 if (get_option(&str, &nr_cpus) && nr_cpus > 0 && nr_cpus < nr_cpu_ids)
920 set_nr_cpu_ids(nr_cpus);
921
922 return 0;
923 }
924
925 early_param("nr_cpus", nrcpus);

But this feels like a "user shot themself in the foot" case. I don't
think it necessary to add checks and warnings for every possible strange
way a user could configure a system.

> >>> +
> >>> + ret = (nr_node_ids - mem_only_nodes) / bitmap_weight(node_caches, nr_node_ids);
> >>> + kfree(node_caches);
> >>> +
> >>> + if (ret > 1)
> >>> + rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = MON_SCOPE_NODE;
> >>> +
> >>> + return ret;
> >>> +}
> >>> +

-Tony