Re: [PATCH v7 04/10] selftests/harness: Fix interleaved scheduling leading to race conditions
From: Mickaël Salaün
Date: Tue Jun 04 2024 - 12:13:45 EST
On Mon, Jun 03, 2024 at 06:22:32PM +0100, Mark Brown wrote:
> On Mon, Jun 03, 2024 at 05:27:52PM +0100, Mark Brown wrote:
> > On Mon, May 27, 2024 at 08:07:40PM +0100, Mark Brown wrote:
>
> > > This is now in mainline and appears to be causing several tests (at
> > > least the ptrace vmaccess global_attach test on arm64, possibly also
> > > some of the epoll tests) that previously were timed out by the harness
> > > to to hang instead. A bisect seems to point at this patch in
> > > particular, there was a bunch of discussion of the fallout of these
> > > patches but I'm afraid I lost track of it, is there something in flight
> > > for this? -next is affected as well from the looks of it.
Thanks for the heads up. I warned about not being able to test
everything when fixing kselftest last time, but nobody show up. Is
there an easy way to run most kselftests? We really need a (more
accessible) CI...
>
> > FWIW I'm still seeing this on -rc2...
>
> AFAICT this is due to the switch to using clone3() with CLONE_VFORK
I guess it started with the previous vfork() that was later replaced
with CLONE_VFORK.
> to start the test which means we never even call alarm() to set up the
> timeout for the test, let alone have the signal for it delivered. I'm a
> confused about how this could ever work, with clone_vfork() the parent
> shouldn't run until the child execs (which won't happen here) or exits.
> Since we don't call alarm() until after we started the child we never
> actually get that far, but even if we reorder things we'll not get the
> signal for the alarm if the child messes up since the parent is
> suspended.
>
> I'm not clear what the original race being fixed here was but it seems
> like we should revert this since the timeout functionality is pretty
> important?
It took me a while to fix all the previous issues and it would be much
easier to just fix this issue too.
I'm working on it.