Re: [tip:sched/core] sched/isolation: Require a present CPU in housekeeping mask

From: Frederic Weisbecker
Date: Mon May 06 2019 - 11:17:04 EST

On Sat, May 04, 2019 at 04:59:12PM +1000, Nicholas Piggin wrote:
> Frederic Weisbecker's on May 4, 2019 10:27 am:
> > On Fri, May 03, 2019 at 10:47:37AM -0700, tip-bot for Nicholas Piggin wrote:
> >> Commit-ID: 9219565aa89033a9cfdae788c1940473a1253d6c
> >> Gitweb:
> >> Author: Nicholas Piggin <npiggin@xxxxxxxxx>
> >> AuthorDate: Thu, 11 Apr 2019 13:34:47 +1000
> >> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> >> CommitDate: Fri, 3 May 2019 19:42:58 +0200
> >>
> >> sched/isolation: Require a present CPU in housekeeping mask
> >>
> >> During housekeeping mask setup, currently a possible CPU is required.
> >> That does not guarantee the CPU would be available at boot time, so
> >> check to ensure that at least one present CPU is in the mask.
> >
> > I have a doubt about the requirements and semantics of cpu_present_mask.
> > IIUC a present CPU means that it is physically plugged in (from ACPI
> > perspective) but might not be logically plugged in (set on cpu_online_mask).
> Right, a superset of cpu_possible_mask, subset of cpu_online_mask. It
> means that CPU can be brought online at any time.
> > But do we have the guarantee that a present CPU _will_ be online at least once
> > right after the boot? After all, kernel parameters such as "maxcpus=" can prevent
> > from turning some CPUs on. I guess there are even more creative ways to achieve
> > that.
> >
> > In any case we really require the housekeeper to be forced online. Perhaps
> > I missed that enforcement somewhere in the patchset?
> No I think you're right, that may be able to boot without anything in
> the housekeeping mask. Maybe we can just cpu_up() a CPU in the
> housekeeping mask with a warning that it has overidden their SMP
> command line option. I'll take a look at it.

But then what if cpu_up() fails? In this case I can think of only two

* Force the boot CPU as the housekeeper.
* Rollback the whole thing: nohz and all isolation.

The second solution looks sane to me. After all if the user doesn't
include CPU 0 in the housekeeping set, forcing it isn't going to
help much.

But that means we must enhance the isolation code (nohz included)
to be able to dynamically add/del CPUs to the houseeeping/isolation
set. That's not going to be easy but it's a necessary evolution
of that subsystem since we want to drive it through cpusets.

I should start working on that.