Re: [PATCH 1/2] sched/topology: Annotate RCU pointers properly

From: Pierre Gondois
Date: Thu Jan 04 2024 - 10:47:00 EST


Hello Valentin,

On 1/4/24 10:54, Valentin Schneider wrote:
On 03/01/24 13:56, Pierre Gondois wrote:
Cleanup RCU-related spare errors by annotating RCU pointers.

sched_domains_numa_distance:
error: incompatible types in comparison expression
(different address spaces):
int [noderef] __rcu *
int *

sched_domains_numa_masks:
error: incompatible types in comparison expression
(different address spaces):
struct cpumask **[noderef] __rcu *
struct cpumask ***

Signed-off-by: Pierre Gondois <pierre.gondois@xxxxxxx>

That's from when the NUMA topologies were made dynamic, which should be:
Fixes: 0fb3978b0aac ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")

Ok yes

---
kernel/sched/topology.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 10d1391e7416..0342a4f41f09 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1542,8 +1542,8 @@ static int sched_domains_numa_levels;
static int sched_domains_curr_level;

int sched_max_numa_distance;
-static int *sched_domains_numa_distance;
-static struct cpumask ***sched_domains_numa_masks;
+static int __rcu *sched_domains_numa_distance;
+static struct cpumask ** __rcu *sched_domains_numa_masks;

I understand that's what sparse is asking for, but that looks odd to me. We
use it as:

rcu_assign_pointer(sched_domains_numa_masks, foo);

so why isn't it

__rcu ***sched_domains_numa_masks;

?

This isn't a pointer to an RCU-protected array of masks, this is an
RCU-protected double array of masks.

I think:
static struct cpumask ** __rcu *sched_domains_numa_masks;
should denote an RCU-protected array^3 of 'struct cpumask', when
static struct cpumask __rcu ***sched_domains_numa_masks;
would denote an array^2 of RCU-protected 'struct cpumask*',
and assignments would look like:
rcu_assign_pointer(**sched_domains_numa_masks, foo);

Meaning that, when taking as a better example:
static int __rcu *sched_domains_numa_distance;
Here we would like to avoid having 'access after free' to the array of
integer allocated to sched_domains_numa_distance.

For sched_domains_numa_masks:
static struct cpumask ** __rcu *sched_domains_numa_masks;
rcu_assign_pointer(sched_domains_numa_masks, foo);
bar = rcu_dereference(sched_domains_numa_masks);
once the first array pointed by sched_domains_numa_masks is accessed/assigned,
we know the RCU-framework makes it safe against 'accesses after free' when
accessing the level-2 array of sched_domains_numa_masks, or accessing the mask
(level 3).

Please let me know if the reasoning seems dodgy.

About the kernel test robot:
kernel/sched/topology.c:1998:19: sparse: sparse: incorrect type in assignment (different address spaces) @@
expected int *distances @@
got int [noderef] __rcu *static [addressable] [assigned] [toplevel] sched_domains_numa_distance
kernel/sched/topology.c:2000:15: sparse: sparse: incorrect type in assignment (different address spaces) @@
expected struct cpumask ***masks @@
got struct cpumask **[noderef] __rcu *static [addressable] [assigned] [toplevel] sched_domains_numa_masks @@
I think the warnings can be ignored since the two pointers (distances, masks)
are only dereferenced after a 'synchronize_rcu()'

Regards,
Pierre

#endif

/*
--
2.25.1