Re: [RFC PATCH v4 08/28] sched: Set up LLC indexing

From: Adam Li

Date: Mon Sep 29 2025 - 06:43:50 EST


On 9/26/2025 9:51 PM, Chen, Yu C wrote:
> Hi Adam,
>
> On 9/26/2025 2:14 PM, Adam Li wrote:
>> Hi Chen Yu,
>>
>> I tested the patch set on AmpereOne CPU with 192 cores.
>> With certain firmware setting, each core has its own L1/L2 cache.
>> But *no* cores share LLC (L3). So *no* schedule domain
>> has flag 'SD_SHARE_LLC'.
>>
>
> Good catch! And many thanks for your detailed testing and
> analysis.
>
> Is this issue triggered with CONFIG_SCHED_CLUSTER disabled?
>

Yes. With CONFIG_SCHED_CLUSTER enabled this issue will
not be triggered. The maximum sd_llc_idx will be less than MAX_LLC(64)
since we have 24 (192/8) cluster domains.

>> With this topology:
>> per_cpu(sd_llc_id, cpu) is actually the cpu id (0-191).
>>
>> And kernel bug will be triggered at:
>> 'BUG_ON(idx > MAX_LLC)'
>>
>
> Yes, the sd_llc_idx thing is a bit tricky - we want to use it to
> index into the static array struct sg_lb_stat.nr_pref_llc, and
> we have to limit its range. A better approach would be to
> dynamically allocate the buffer, so we could get rid of the
> 'idx > MAX_LLC' check, but that might complicate the code.
>
>> Please see details bellow.
>>
>> The bug will disappear if setting 'MAX_LLC' to 192.
>> But I think we might disable CAS(cache aware scheduling)
>> if no domain has 'SD_SHARE_LLC'.
>>
>
> I agree with you. Simply disabling cache-aware scheduling
> if there is no SD_SHARE_LLC would be simpler.
>
>> On 8/9/2025 1:03 PM, Chen Yu wrote:
>> A draft patch like bellow can fix the kernel BUG:
>> 1) Do not call update_llc_idx() if domain has no SD_SHARE_LLC
>> 2) Disable CAS if domain has no SD_SHARE_LLC
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 8483c02b4d28..cde9b6cdb1de 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -704,7 +704,8 @@ static void update_top_cache_domain(int cpu)
>>          per_cpu(sd_llc_size, cpu) = size;
>>          per_cpu(sd_llc_id, cpu) = id;
>>          rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
>> -       update_llc_idx(cpu);
>> +       if (sd)
>> +               update_llc_idx(cpu);
>>
>
> OK, that make sense.
>
>>          sd = lowest_flag_domain(cpu, SD_CLUSTER);
>>          if (sd)
>> @@ -2476,6 +2477,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>          int i, ret = -ENOMEM;
>>          bool has_asym = false;
>>          bool has_cluster = false;
>> +       bool has_llc = false;
>>          bool llc_has_parent_sd = false;
>>          unsigned int multi_llcs_node = 1;
>>
>> @@ -2621,6 +2623,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>
>>                  if (lowest_flag_domain(i, SD_CLUSTER))
>>                          has_cluster = true;
>> +
>> +               if (highest_flag_domain(i, SD_SHARE_LLC))
>> +                       has_llc = true;
>>          }
>>          rcu_read_unlock();
>>
>> @@ -2631,7 +2636,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>                  static_branch_inc_cpuslocked(&sched_cluster_active);
>>
>>   #ifdef CONFIG_SCHED_CACHE
>> -       if (llc_has_parent_sd && multi_llcs_node && !sched_asym_cpucap_active())
>> +       if (has_llc && llc_has_parent_sd && multi_llcs_node &&
>
> multi_llcs_node will be false if there is no SD_SHARE_LLC domain on the
> platform, so I suppose we don’t have to introduce has_llc?
> multi_llcs is set to true iff there are more than 1 SD_SHARE_LLC domains under its
> SD_SHARE_LLC parent domain.
>

If there is *no* SD_SHARE_LLC domain, my test shows 'multi_llcs_node' is still 1 (true).

Looks it is because the default value of 'multi_llcs_node' is 1.

build_sched_domains():
unsigned int multi_llcs_node = 1;

And this condition is always false since we have no SD_SHARE_LLC domain,
therefore 'multi_llcs_node' will not be changed:

if (!(sd->flags & SD_SHARE_LLC) && child &&
(child->flags & SD_SHARE_LLC))

Thanks,
-adam