Re: [PATCH 0/6] support "dataplane" mode for nohz_full

From: Mike Galbraith
Date: Fri May 15 2015 - 14:44:32 EST


On Fri, 2015-05-15 at 11:05 -0400, Chris Metcalf wrote:
> On 05/11/2015 09:47 PM, Mike Galbraith wrote:
> > On Mon, 2015-05-11 at 15:25 -0400, Chris Metcalf wrote:
> >> On 05/11/2015 03:19 PM, Mike Galbraith wrote:
> >>> I really shouldn't have acked nohz_full -> isolcpus. Beside the fact
> >>> that old static isolcpus was_supposed_ to crawl off and die, I know
> >>> beyond doubt that having isolated a cpu as well as you can definitely
> >>> does NOT imply that said cpu should become tickless.
> >> True, at a high level, I agree that it would be better to have a
> >> top-level concept like Frederic's proposed ISOLATION that includes
> >> isolcpus and nohz_cpu (and other stuff as needed).
> >>
> >> That said, what you wrote above is wrong; even with the patch you
> >> acked, setting isolcpus does not automatically turn on nohz_full for
> >> a given cpu. The patch made it true the other way around: when
> >> you say nohz_full, you automatically get isolcpus on that cpu too.
> >> That does, at least, make sense for the semantics of nohz_full.
> > I didn't write that, I wrote nohz_full implies (spelled '->') isolcpus.
> > Yes, with nohz_full currently being static, the old allegedly dying but
> > also static isolcpus scheduler off switch is a convenient thing to wire
> > the nohz_full CPU SET (<- hint;) property to.
>
> Yes, I was responding to the bit where you said "having isolated a
> cpu as well as you can does NOT imply it should become tickless",
> but indeed, the "nohz_full -> isolcpus" patch didn't make that true.
> In any case sounds like we were just talking past each other.

Yup.

> > BTW, another facet of this: Rik wants to make isolcpus immune to
> > cpusets, which makes some sense, user did say isolcpus=, but that also
> > makes isolcpus truly static. If the user now says nohz_full=, they lose
> > the ability to deactivate CPU isolation, making the set fairly useless
> > for anything other than HPC. Currently, the user can flip the isolation
> > switch as he sees fit. He takes a size extra large performance hit for
> > having said nohz_full=, but he doesn't lose generic utility.
>
> I don't I follow this completely. If the user says nohz_full=, he
> probably doesn't care about deactivating isolcpus later, since that
> defeats the entire purpose of the nohz_full= in the first place,
> as far as I can tell. And when you say "anything other than HPC",
> I'm not sure what you mean; as far as I know high-performance
> computing only cares because it wants that extra 0.5% of the
> cpu or whatever interrupts eat up, but just as a nice-to-have.
> The real use case is high-performance userspace drivers where
> the nohz_full cores are responding to real-time things like packet
> arrivals with almost no latency to spare.

Ok, verbosity on.

Currently, nohz_full is static, meaning in a dynamic environment, where
the user may not have a constant need for it, if you make it imply
isolcpus, then make isolcpus immutable, you have just needlessly taken
an option from the user. Those CPUS are no longer part of his generic
resource pool, and he has nothing to say about it.

> What is the generic utility you're envisioning for nohz_full cores
> that have turned off scheduler isolation? I assume it's some
> workload where you'd prefer not to have too many interrupts
> but still are running multiple tasks, but in that case does it really
> make much difference in practice?

Again, I think we're talking past one another.

I'm saying there is no need to mandate, nothing more. For your needs,
my needs whatever, that immutable may sound good, but in fact, it
removes flexibility, and for no good reason.

This shows immediately in simple testing. Do I need nohz_full? Hell
no, only for testing. If I want to test, I obviously need it for a
while, and yes, I can reboot... but what's the difference between me the
silly tester who needs it only to see if it works at all, and how well,
and some guy who does something critical once in a while, or a company
with a pool of big boxen that they reconfigure on the fly to meet
whatever dynamic needs?

Just because the nohz_full feature itself is currently static is no
reason to put users thereof in a straight jacket by mandating that any
set they define irrevocably disappears from the generic resource pool .
Those CPUS are useful until the moment someone cripples them, which
making nohz_full imply isolcpus does if isolcpus then also becomes
immutable, which Rik's patch does. Making nohz_full imply isolcpus
sounds perfectly fine until someone comes along and makes isolcpus
immutable (Rik's patch), at which point the user loses a choice due to
two people making it imply things that _alone_ sound perfectly fine.

See what I'm saying now?

> > Thomas has nuked the hrtimer softirq.
>
> Yes, this I didn't know. So I will drop my "no ksoftirqd" patch and
> we will see if ksoftirqs emerge as an issue for my "cpu isolation"
> stuff in the future; it may be that that was the only issue.
>
> > Inlining softirqs may save a context switch, but adds cycles that we may
> > consume at higher frequency than the thing we're avoiding.
>
> Yes but consuming cycles is not nearly as much of a concern
> as avoiding interrupts or scheduling, certainly for the case of
> userspace drivers that I described above.

If you're raising softirqs in an SMP kernel, you're also doing something
that puts you at very serious risk of meeting the jitter monster, locks,
and worse, sleeping locks, no?

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/