Re: [PATCH] perf: ignore exited thread when synthesize thread map

From: Ian Rogers
Date: Tue Dec 05 2023 - 12:55:09 EST


On Tue, Nov 28, 2023 at 9:12 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Mon, Nov 27, 2023 at 10:23 PM cruzzhao <cruzzhao@xxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > 在 2023/11/23 上午5:05, Namhyung Kim 写道:
> > > Hello,
> > >
> > > On Tue, Nov 21, 2023 at 6:22 PM Cruz Zhao <CruzZhao@xxxxxxxxxxxxxxxxx> wrote:
> > >>
> > >> When synthesize thread map, some threads in thread map may have
> > >> already exited, so that __event__synthesize_thread() returns -1
> > >> and the synthesis breaks. However, It will not have any effect
> > >> if we just ignore the exited thread. So just ignore it and continue.
> > >
> > > Looks ok. But I guess you want to do the same for the leader
> > > thread below as well.
> > >
> > > Thanks,
> > > Namhyung
> > >
> >
> > With my testcase, no error is returned even if we don't do the same for
> > the leader thread blow. Well, I'll check whether the logic is still
> > correct if we do so.
> >
> > Many thanks for reviewing.
>
> Thanks for looking at this. Could you share the test? It looks like
> the thread be removed from the thread map to avoid potential future
> broken accesses like below:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/synthetic-events.c?h=perf-tools-next#n887
>
> Some of the race will hopefully get narrowed by switching to a less
> memory intense readdir:
> https://lore.kernel.org/lkml/20231127220902.1315692-7-irogers@xxxxxxxxxx/
>
> Threads racing is an issue in this example:
> ```
> $ sudo perf top --stdio -u `whoami`
> Error:
> The sys_perf_event_open() syscall returned with 3 (No such process)
> for event (cycles:P).
> /bin/dmesg | grep -i perf may provide additional information.
> ```
>
> Generally the races are covered by the dummy event that gathers
> sideband data like thread creation and exits, which is created prior
> to synthesis. It would be nice to have a better threading abstraction
> to avoid these races.
>
> Thanks,
> Ian

Fwiw, we hit more of these issues when running the test suite in
parallel. I posted changes to do that along with a similar fix:
https://lore.kernel.org/lkml/20231201235031.475293-1-irogers@xxxxxxxxxx/

Thanks,
Ian

> > Best,
> > Cruz Zhao