Re: [PATCH v7 04/10] selftests/harness: Fix interleaved scheduling leading to race conditions

From: Mark Brown
Date: Mon Jun 03 2024 - 13:22:57 EST


On Mon, Jun 03, 2024 at 05:27:52PM +0100, Mark Brown wrote:
> On Mon, May 27, 2024 at 08:07:40PM +0100, Mark Brown wrote:

> > This is now in mainline and appears to be causing several tests (at
> > least the ptrace vmaccess global_attach test on arm64, possibly also
> > some of the epoll tests) that previously were timed out by the harness
> > to to hang instead. A bisect seems to point at this patch in
> > particular, there was a bunch of discussion of the fallout of these
> > patches but I'm afraid I lost track of it, is there something in flight
> > for this? -next is affected as well from the looks of it.

> FWIW I'm still seeing this on -rc2...

AFAICT this is due to the switch to using clone3() with CLONE_VFORK
to start the test which means we never even call alarm() to set up the
timeout for the test, let alone have the signal for it delivered. I'm a
confused about how this could ever work, with clone_vfork() the parent
shouldn't run until the child execs (which won't happen here) or exits.
Since we don't call alarm() until after we started the child we never
actually get that far, but even if we reorder things we'll not get the
signal for the alarm if the child messes up since the parent is
suspended.

I'm not clear what the original race being fixed here was but it seems
like we should revert this since the timeout functionality is pretty
important?

Attachment: signature.asc
Description: PGP signature