Re: [PATCH] perf: fix topology test on systems with sparse CPUs

From: Jiri Olsa
Date: Thu Feb 02 2017 - 06:29:21 EST


On Tue, Jan 31, 2017 at 05:03:51PM +0100, Jan Stancek wrote:
> On 01/30/2017 07:49 PM, Jiri Olsa wrote:
> > so basically we're changing from avail to online cpus
> >
> > have you checked all the users of this FEATURE
> > if such change is ok?
>
> Jiri,
>
> It wasn't OK as there are other users who index cpu_topology_map by CPU id.
> I decided to give the alternative a try (attached): keep cpu_topology_map
> indexed by CPU id, but extend it to fit max present CPU.

please send this next time as a standard patchset,
it's hard to discuss over attachments

SNIP

> When build_cpu_topo() encounters offline/absent CPUs,
> it fails to find any sysfs entries and returns failure.
> This leads to build_cpu_topology() and write_cpu_topology()
> failing as well.
>
> Because HEADER_CPU_TOPOLOGY has not been written, read leaves
> cpu_topology_map NULL and we get NULL ptr deref at:
>
> ...
> cmd_test
> __cmd_test
> test_and_print
> run_test
> test_session_topology
> check_cpu_topology

So IIUIC that's the key issue here.. write_cpu_topology that fails
to write the TOPO data and following readers crashing on processing
uncomplete data? if thats the case write_cpu_topology needs to
be fixed, instead of doing workarounds

SNIP

> u32 nr, i;
> size_t sz;
> long ncpus;
> - int ret = -1;
> + int ret = 0;
> + struct cpu_map *map;
>
> ncpus = sysconf(_SC_NPROCESSORS_CONF);
> if (ncpus < 0)
> - return NULL;
> + goto out;

can just return NULL

> +
> + /* build online CPU map */
> + map = cpu_map__new(NULL);
> + if (map == NULL) {
> + pr_debug("failed to get system cpumap\n");
> + goto out;
> + }
>
> nr = (u32)(ncpus & UINT_MAX);
>
> sz = nr * sizeof(char *);
> -
> addr = calloc(1, sizeof(*tp) + 2 * sz);
> if (!addr)
> - return NULL;
> + goto out_free;
>
> tp = addr;
> tp->cpu_nr = nr;
> @@ -530,14 +537,21 @@ static struct cpu_topo *build_cpu_topology(void)
> tp->thread_siblings = addr;
>
> for (i = 0; i < nr; i++) {
> + if (!cpu_map__has(map, i))
> + continue;
> +

so this prevents build_cpu_topo to fail due to missing topology
info because cpu is offline.. can it fail for other reasons?


> ret = build_cpu_topo(tp, i);
> if (ret < 0)
> break;


SNIP