Re: [PATCH 0/2] pipe: Fixes [ver #2]
From: Vincent Guittot
Date: Tue Dec 10 2019 - 09:38:44 EST
On Mon, 9 Dec 2019 at 18:48, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> [ Added DJ to the participants, since he seems to be the Fedora make
> maintainer - DJ, any chance that this absolutely horrid 'make' buf can
> be fixed in older versions too, not just rawhide? The bugfix is two
> and a half years old by now, and the bug looks real and very serious ]
>
> On Mon, Dec 9, 2019 at 1:54 AM Vincent Guittot
> <vincent.guittot@xxxxxxxxxx> wrote:
> >
> > Which version of make should I use to reproduce the problem ?
>
> So the problematic one is "make-4.2.1-13.fc30.x86_64" in Fedora 30.
> I'm assuming it's fairly plain 4.2.1, but I didn't try to look into
> the source rpm or anything like that.
I'm using Debian buster and the make package is version: 4.2.1-1.2 for
arm64. It doesn't have the commit you mentioned below but I don't see
the problem on my platform and all 8 cpus are used with -j 16 or even
-j 9
>
> The working one for me was just the top of -git from
>
> https://git.savannah.gnu.org/git/make.git
>
> which is 4.2.92 right now.
>
> The fix is presumably commit b552b05 ("[SV 51159] Use a non-blocking
> read with pselect to avoid hangs") as per Akemi. That is indeed after
> 4.2.1, and it looks real.
>
> Before that commit the buggy jobserver code basically does
>
> (1) use pselect() to wait for readable and see child deaths atomically
> (2) use blocking read to get the token
>
> and while (1) is atomic, if the child death happens between the two,
> it goes into the blocking read and has SIGCHLD blocked, so it will try
> to read the token from the token pipe, but it will never react to the
> child death - and the child death is what is going to _release_ a
> token.
>
> So what seems to happen is that when the right timing triggers, you
That can explain why I can't see the problem on my platform
> end up with a lot of sub-makes waiting for a token, but they are also
> all supposed to _release_ a token. So you don't have enough tokens to
> go around. In the worst case, _everybody_ who has a token is also not
> releasing it, and then you end up triggering the timeout code (after
> one second), which will make things really go into a crawl.
>
> And by a crawl I mean that worst-case you really end up with just one
> job per second per sub-make. It will take _hours_ to compile the
> kernel at that speed, when it normally finishes in 15 minutes on my
> machine even when I do a from-scratch allmodconfig build.
>
> It does seem to be a major bug in the jobserver code. In particular
> with the trial fair and exclusive wakeup patch that I sent out in the
> other thread, it seems to be _reliably_ much worse and triggers 100%
> of the time for me.
>
> It's possible that my trial patch is buggy, but everything else looks
> fine, and with a fixed make the trial patch works for me.
>
> I'll include the trial patch here too, I think I cc'd you on the other
> thread too, but hey..
>
> Anyway, it looks like the sync wakeup thing is more of a "get timing
> right by luck" thing than anything else. Possibly it actually causes
> the reverse order of reader wakeups more often (ie the most _recent_
> reader is most likely to get woken up synchronously) and that may be
> what really ends up masking the jobserver problem, since apparently
> doing wakeups in the fair and proper order makes things much worse..
>
> What a horrible pain that pipe rework ended up being. But I think
> we're in better shape now than we used to be, it just had very
> unfortunate timing issues and several real bugs.
>
> But sadly, there's no way I can push that fair pipe wakeup thing as
> long as this horribly buggy version of make is widespread.
>
> Linus