Re: [tip:sched/core] sched/isolation: Require a present CPU in housekeeping mask

From: Nicholas Piggin
Date: Mon May 06 2019 - 19:51:19 EST


Frederic Weisbecker's on May 7, 2019 1:16 am:
> On Sat, May 04, 2019 at 04:59:12PM +1000, Nicholas Piggin wrote:
>> Frederic Weisbecker's on May 4, 2019 10:27 am:
>> > On Fri, May 03, 2019 at 10:47:37AM -0700, tip-bot for Nicholas Piggin wrote:
>> >> Commit-ID: 9219565aa89033a9cfdae788c1940473a1253d6c
>> >> Gitweb: https://git.kernel.org/tip/9219565aa89033a9cfdae788c1940473a1253d6c
>> >> Author: Nicholas Piggin <npiggin@xxxxxxxxx>
>> >> AuthorDate: Thu, 11 Apr 2019 13:34:47 +1000
>> >> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> >> CommitDate: Fri, 3 May 2019 19:42:58 +0200
>> >>
>> >> sched/isolation: Require a present CPU in housekeeping mask
>> >>
>> >> During housekeeping mask setup, currently a possible CPU is required.
>> >> That does not guarantee the CPU would be available at boot time, so
>> >> check to ensure that at least one present CPU is in the mask.
>> >
>> > I have a doubt about the requirements and semantics of cpu_present_mask.
>> > IIUC a present CPU means that it is physically plugged in (from ACPI
>> > perspective) but might not be logically plugged in (set on cpu_online_mask).
>>
>> Right, a superset of cpu_possible_mask, subset of cpu_online_mask. It
>> means that CPU can be brought online at any time.
>>
>> > But do we have the guarantee that a present CPU _will_ be online at least once
>> > right after the boot? After all, kernel parameters such as "maxcpus=" can prevent
>> > from turning some CPUs on. I guess there are even more creative ways to achieve
>> > that.
>> >
>> > In any case we really require the housekeeper to be forced online. Perhaps
>> > I missed that enforcement somewhere in the patchset?
>>
>> No I think you're right, that may be able to boot without anything in
>> the housekeeping mask. Maybe we can just cpu_up() a CPU in the
>> housekeeping mask with a warning that it has overidden their SMP
>> command line option. I'll take a look at it.
>
> But then what if cpu_up() fails? In this case I can think of only two
> answers:
>
> * Force the boot CPU as the housekeeper.
> * Rollback the whole thing: nohz and all isolation.

If cpu_up fails despite being in the present map and we explicitly
selected it as the housekeeper? I think it would be okay to print
a message telling admin to correct the config, and panic.

We try a best effort to make the system boot and limp along, but if
you misconfigure it, crashing is not unreasonable. There's lots of
command line option misconfiguration that will cause the same thing.

The primary problem with my patch that needs to be addressed is that
the error is not explicitly caught and printed if the housekeeper
does not come up, so the system might die in non-obvious ways.

>
> The second solution looks sane to me. After all if the user doesn't
> include CPU 0 in the housekeeping set, forcing it isn't going to
> help much.
>
> But that means we must enhance the isolation code (nohz included)
> to be able to dynamically add/del CPUs to the houseeeping/isolation
> set. That's not going to be easy but it's a necessary evolution
> of that subsystem since we want to drive it through cpusets.
>
> I should start working on that.

I considered that when looking at the series, but couldn't justify
the complexity based on my usage (which is static boot time).

If you have other uses for it, then that would solve all these boot
time issues as well, which will be nice.

Thanks,
Nick