Re: [PATCH 1/3] perf sched stats: Fix SIGCHLD race in schedstat_record()
From: Ian Rogers
Date: Wed Apr 01 2026 - 12:47:15 EST
On Tue, Mar 31, 2026 at 11:42 PM Swapnil Sapkal <swapnil.sapkal@xxxxxxx> wrote:
>
> When a very short-lived workload is used with 'perf sched stats record',
> the child process can exit and deliver SIGCHLD between
> evlist__start_workload() and pause(). Since pause() only returns when a
> signal is received while suspended, and the SIGCHLD has already been
> delivered and handled by then, pause() blocks indefinitely.
>
> Fix this by blocking SIGCHLD before starting the workload and replacing
> pause() with sigsuspend(). sigsuspend() atomically unblocks SIGCHLD and
> suspends the process, ensuring no signal is lost regardless of how
> quickly the child exits.
>
> Assisted-by: Claude:claude-opus-4.6
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@xxxxxxx>
Thanks Swapnil! In the Sashiko reviews there were some nits about
clean up on error paths but also 1 about signals potentially being
masked in the perf workload:
https://sashiko.dev/#/patchset/20260401064114.141066-1-swapnil.sapkal%40amd.com
Could you take a look?
Ian
> ---
> tools/perf/builtin-sched.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 3f509cfdd58c..eb3702d98fd1 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -3807,6 +3807,7 @@ const char *output_name;
> static int perf_sched__schedstat_record(struct perf_sched *sched,
> int argc, const char **argv)
> {
> + sigset_t sigchld_mask, oldmask;
> struct perf_session *session;
> struct target target = {};
> struct evlist *evlist;
> @@ -3822,6 +3823,15 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
> signal(SIGCHLD, sighandler);
> signal(SIGTERM, sighandler);
>
> + /*
> + * Block SIGCHLD early so that a short-lived workload cannot deliver
> + * the signal before we are ready to wait for it. sigsuspend() below
> + * will atomically unblock it.
> + */
> + sigemptyset(&sigchld_mask);
> + sigaddset(&sigchld_mask, SIGCHLD);
> + sigprocmask(SIG_BLOCK, &sigchld_mask, &oldmask);
> +
> evlist = evlist__new();
> if (!evlist)
> return -ENOMEM;
> @@ -3902,8 +3912,15 @@ static int perf_sched__schedstat_record(struct perf_sched *sched,
> if (argc)
> evlist__start_workload(evlist);
>
> - /* wait for signal */
> - pause();
> + /*
> + * Use sigsuspend() instead of pause() to avoid a race where a
> + * short-lived workload exits and delivers SIGCHLD before pause()
> + * is entered, causing it to block indefinitely. sigsuspend()
> + * atomically unblocks SIGCHLD (blocked above) and suspends,
> + * ensuring no signal is lost.
> + */
> + sigsuspend(&oldmask);
> + sigprocmask(SIG_SETMASK, &oldmask, NULL);
>
> if (reset) {
> err = disable_sched_schedstat();
> --
> 2.43.0
>