Re: [PATCH 0/2] pipe: Fixes [ver #2]

From: Linus Torvalds
Date: Tue Dec 10 2019 - 12:40:02 EST


On Tue, Dec 10, 2019 at 6:38 AM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Mon, 9 Dec 2019 at 18:48, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Before that commit the buggy jobserver code basically does
> >
> > (1) use pselect() to wait for readable and see child deaths atomically
> > (2) use blocking read to get the token
> >
> > and while (1) is atomic, if the child death happens between the two,
> > it goes into the blocking read and has SIGCHLD blocked, so it will try
> > to read the token from the token pipe, but it will never react to the
> > child death - and the child death is what is going to _release_ a
> > token.
> >
> > So what seems to happen is that when the right timing triggers, you
>
> That can explain why I can't see the problem on my platform

Note that the above is kind of simplified.

It actually needs a bit more to trigger..

To lose _one_ token, you need to have a sub-make basically hit this race:

- the pselect() needs to say that the pipe is readable, so there is
at least one token

- another sub-make comes along and steals the very last token

- but because pselect returned "readable" (no SIGCHLD yet), the
read() starts and now blocks because all the jobserver tokens are gone
again due to the other sub-make stealing the last one.

- before a new token comes in, the child exits, and now because the
sub-make is blocking for reads (and because the jobserver blocks
SIGCHILD in general outside of the pselect), it doesn't react, so it
won't release the token that the child holds.

but notice how any _other_ sub-make then releasing a token will get
things going again, so the _common_situation is that the jobserver bug
only causes a slight dip in concurrency.

Hitting it _once_ is trivial.

Losing several tokens at once is also not that hard: you don't need to
hit the race many times, it's enough to hit the exact same race just
once - just with several sub-makes at the same time.

And that "lose several tokens at once" isn't as unlikely as you'd
think: they are all doing the same thing, and they all saw the free
token with "pselect()", they all did a "read()". And since it's common
for the tokens to be gone, the common case is that _one_ of the
waiting sub-makes got the read, and the N other sub-makes did not, and
went into the blocking read(). And they all had children that were
about to finish, and finished before the next token got available.

So losing quite a few tokens is easy.

This has actually gone on for a long time, and I just never debugged
it. My solution has been "I have 16 threads (8 core with HT), but I'll
use -j32, and it is all good".

I bet you can see it too - the buggy jobserver just means that the
load isn't as high as you'd expect. Just run 'top' while the make is
going.

With the _fixed_ jobserver, if I do "make -j32", I will actually see a
load that is over 32 (I do nothing but kernel compiles and occasional
reboots during the merge window, so I have the kernel in my cache, so
there's no IO, I have a fast SSD so writeback doesn't cause any delays
either etc etc, and I have my browser and a few other things going).

With the buggy one, even before the pipe rework, I would see a load
that tended to fluctuate around 16.

Because due to the bug you have a few locked-up tokens at times, so
instead of getting a load of 32 when you use "make -j32", you get a
load of maybe 20. Or maybe less.

The pipe re-work made it much easier to trigger the "almost all the
tokens are gone" for some reason. And the "fair readers()" patch I
have seems to make it _really_ easy to trigger the case where
absolutely all the tokens were gone and it goes into a single-thread
mode. I'm not sure I really ever saw the 1-second timeout trigger, but
it was slow.

But it _is_ dependent on timing, so somebody else with a different
load - or a different machine - might not see it to nearly the same
degree.

I bet you see the load value difference, though, even if you don't
necessarily see enough idle CPU time to see much of a difference in
compile times. After all, once you have all CPU's busy, it doesn't
matter if you have a load of 16 or a load of 32 - the higher load
won't make the compile go any faster.

Linus