Re: [PATCH 4/6] nohz: support PR_DATAPLANE_QUIESCE

From: Chris Metcalf
Date: Thu May 14 2015 - 16:54:59 EST

On 05/12/2015 05:33 AM, Peter Zijlstra wrote:
On Fri, May 08, 2015 at 01:58:45PM -0400, Chris Metcalf wrote:
This prctl() flag for PR_SET_DATAPLANE sets a mode that requires the
kernel to quiesce any pending timer interrupts prior to returning
to userspace. When running with this mode set, sys calls (and page
faults, etc.) can be inordinately slow. However, user applications
that want to guarantee that no unexpected interrupts will occur
(even if they call into the kernel) can set this flag to guarantee
that semantics.
Currently people hot-unplug and hot-plug the CPU to do this. Obviously
that's a wee bit horrible :-)

Not sure if a prctl like this is any better though. This is a CPU
properly not a process one.

The CPU property aspects, I think, should be largely handled by
fixing kernel bugs that let work end up running on nohz_full cores
without having been explicitly requested to run there.

As you said in a follow-up email:

On 05/12/2015 06:38 AM, Peter Zijlstra wrote:
Ideally we'd never have to clear the state because it should be
impossible to get into this predicament in the first place.

What my prctl() proposal does is quiesce things that end up
happening specifically because the user process called on purpose
into the kernel. For example, perhaps RCU was invoked in the
kernel, and the core has to wait a timer tick to quiesce RCU.
Whatever causes it, the intent is that you're not allowed back into
userspace until everything has settled down from your call into
the kernel; the presumption is that it's all due to the kernel entry
that was just made, and not from other stray work.

In that sense, it's very appropriate for it to be a process property.

ISTR people talking about 'quiesce' sysfs file, along side the hotplug
stuff, I can't quite remember.

It seems somewhat similar (adding Viresh to the cc's) but does
seem like it might have been more intended to address the
CPU properties rather than process properties:

One thing the original Tilera dataplane code did was to require
setting dataplane flags to succeed only on dataplane cores,
and only when the task had been affinitized to that single core.
This did not protect the task from later being re-affinitized in
a way that broke those assumptions, but I suppose you could
also imagine make sched_setaffinity() fail for such a process.
Somewhat unrelated, but it occurred to me in the context of this
reply, so what do you think? I can certainly add this to the
patch series if it seems like it makes setting the prctl() flags
more conservative.

Chris Metcalf, EZChip Semiconductor

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at