Re: [tip:sched/core] sched/isolation: Require a present CPU in housekeeping mask

From: Nicholas Piggin
Date: Tue May 07 2019 - 21:39:25 EST


Frederic Weisbecker's on May 8, 2019 10:35 am:
> On Tue, May 07, 2019 at 09:50:24AM +1000, Nicholas Piggin wrote:
>> Frederic Weisbecker's on May 7, 2019 1:16 am:
>> > On Sat, May 04, 2019 at 04:59:12PM +1000, Nicholas Piggin wrote:
>> >> Frederic Weisbecker's on May 4, 2019 10:27 am:
>> >> > On Fri, May 03, 2019 at 10:47:37AM -0700, tip-bot for Nicholas Piggin wrote:
>> >> >> Commit-ID: 9219565aa89033a9cfdae788c1940473a1253d6c
>> >> >> Gitweb: https://git.kernel.org/tip/9219565aa89033a9cfdae788c1940473a1253d6c
>> >> >> Author: Nicholas Piggin <npiggin@xxxxxxxxx>
>> >> >> AuthorDate: Thu, 11 Apr 2019 13:34:47 +1000
>> >> >> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> >> >> CommitDate: Fri, 3 May 2019 19:42:58 +0200
>> >> >>
>> >> >> sched/isolation: Require a present CPU in housekeeping mask
>> >> >>
>> >> >> During housekeeping mask setup, currently a possible CPU is required.
>> >> >> That does not guarantee the CPU would be available at boot time, so
>> >> >> check to ensure that at least one present CPU is in the mask.
>> >> >
>> >> > I have a doubt about the requirements and semantics of cpu_present_mask.
>> >> > IIUC a present CPU means that it is physically plugged in (from ACPI
>> >> > perspective) but might not be logically plugged in (set on cpu_online_mask).
>> >>
>> >> Right, a superset of cpu_possible_mask, subset of cpu_online_mask. It
>> >> means that CPU can be brought online at any time.
>> >>
>> >> > But do we have the guarantee that a present CPU _will_ be online at least once
>> >> > right after the boot? After all, kernel parameters such as "maxcpus=" can prevent
>> >> > from turning some CPUs on. I guess there are even more creative ways to achieve
>> >> > that.
>> >> >
>> >> > In any case we really require the housekeeper to be forced online. Perhaps
>> >> > I missed that enforcement somewhere in the patchset?
>> >>
>> >> No I think you're right, that may be able to boot without anything in
>> >> the housekeeping mask. Maybe we can just cpu_up() a CPU in the
>> >> housekeeping mask with a warning that it has overidden their SMP
>> >> command line option. I'll take a look at it.
>> >
>> > But then what if cpu_up() fails? In this case I can think of only two
>> > answers:
>> >
>> > * Force the boot CPU as the housekeeper.
>> > * Rollback the whole thing: nohz and all isolation.
>>
>> If cpu_up fails despite being in the present map and we explicitly
>> selected it as the housekeeper? I think it would be okay to print
>> a message telling admin to correct the config, and panic.
>>
>> We try a best effort to make the system boot and limp along, but if
>> you misconfigure it, crashing is not unreasonable. There's lots of
>> command line option misconfiguration that will cause the same thing.
>>
>> The primary problem with my patch that needs to be addressed is that
>> the error is not explicitly caught and printed if the housekeeper
>> does not come up, so the system might die in non-obvious ways.
>
> I usually reserve panic and BUG_ON() to last resort when data integrity is
> directly threatened. But indeed I guess that's all we have for now.

Right, specifying a CPU for housekeeping that excluded from coming
up at boot with maxcpus= or whatever, is not such a big deal to
panic I think. Just need to have a clear error message.

> If we take that path, I'd rather not call that cpu_up() and simply panic if
> the given CPU happens not to be online after SMP bootup.

Sure that's fine by me too.

Thanks,
Nick