Re: [PATCH] perf: fix topology test on systems with sparse CPUs

From: Jan Stancek
Date: Thu Feb 02 2017 - 07:07:16 EST


>
> > When build_cpu_topo() encounters offline/absent CPUs,
> > it fails to find any sysfs entries and returns failure.
> > This leads to build_cpu_topology() and write_cpu_topology()
> > failing as well.
> >
> > Because HEADER_CPU_TOPOLOGY has not been written, read leaves
> > cpu_topology_map NULL and we get NULL ptr deref at:
> >
> > ...
> > cmd_test
> > __cmd_test
> > test_and_print
> > run_test
> > test_session_topology
> > check_cpu_topology
>
> So IIUIC that's the key issue here.. write_cpu_topology that fails
> to write the TOPO data and following readers crashing on processing
> uncomplete data? if thats the case write_cpu_topology needs to
> be fixed, instead of doing workarounds

It's already late when you are in write_cpu_topology(), because
build_cpu_topology() returned you NULL - there's nothing to write.
That's why patch aims to fix this in build_cpu_topology().

>
> SNIP
>
> > u32 nr, i;
> > size_t sz;
> > long ncpus;
> > - int ret = -1;
> > + int ret = 0;
> > + struct cpu_map *map;
> >
> > ncpus = sysconf(_SC_NPROCESSORS_CONF);
> > if (ncpus < 0)
> > - return NULL;
> > + goto out;
>
> can just return NULL
>
> > +
> > + /* build online CPU map */
> > + map = cpu_map__new(NULL);
> > + if (map == NULL) {
> > + pr_debug("failed to get system cpumap\n");
> > + goto out;
> > + }
> >
> > nr = (u32)(ncpus & UINT_MAX);
> >
> > sz = nr * sizeof(char *);
> > -
> > addr = calloc(1, sizeof(*tp) + 2 * sz);
> > if (!addr)
> > - return NULL;
> > + goto out_free;
> >
> > tp = addr;
> > tp->cpu_nr = nr;
> > @@ -530,14 +537,21 @@ static struct cpu_topo *build_cpu_topology(void)
> > tp->thread_siblings = addr;
> >
> > for (i = 0; i < nr; i++) {
> > + if (!cpu_map__has(map, i))
> > + continue;
> > +
>
> so this prevents build_cpu_topo to fail due to missing topology
> info because cpu is offline.. can it fail for other reasons?

It's unlikely, though I suppose if you couldn't open and read something
from sysfs (say sysfs is not mounted) it can fail for online CPU too.

>
>
> > ret = build_cpu_topo(tp, i);
> > if (ret < 0)
> > break;
>

SNIP

> For example:
> _SC_NPROCESSORS_CONF == 16
> available: 2 nodes (0-1)
> node 0 cpus: 0 6 8 10 16 22 24 26
> node 0 size: 12004 MB
> node 0 free: 9470 MB
> node 1 cpus: 1 7 9 11 23 25 27
> node 1 size: 12093 MB
> node 1 free: 9406 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
> so what's max_present_cpu in this example?

It's 28, which is the number of core_id/socket_id entries,
for CPUs 0 up to 27.

Regards,
Jan