Re: [PATCH v2 tip/core/rcu 0/10] RCU-tasks implementation

From: Paul E. McKenney
Date: Thu Jul 31 2014 - 17:11:36 EST


On Thu, Jul 31, 2014 at 01:58:17PM -0700, josh@xxxxxxxxxxxxxxxx wrote:
> On Thu, Jul 31, 2014 at 11:38:16AM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 31, 2014 at 10:20:24AM -0700, josh@xxxxxxxxxxxxxxxx wrote:
> > > On Thu, Jul 31, 2014 at 09:58:43AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jul 31, 2014 at 09:19:02AM -0700, josh@xxxxxxxxxxxxxxxx wrote:
> > > > > On Wed, Jul 30, 2014 at 05:39:14PM -0700, Paul E. McKenney wrote:
> > > > > > This series provides a prototype of an RCU-tasks implementation, which has
> > > > > > been requested to assist with tramopoline removal. This flavor of RCU
> > > > > > is task-based rather than CPU-based, and has voluntary context switch,
> > > > > > usermode execution, and the idle loops as its only quiescent states.
> > > > > > This selection of quiescent states ensures that at the end of a grace
> > > > > > period, there will no longer be any tasks depending on a trampoline that
> > > > > > was removed before the beginning of that grace period. This works because
> > > > > > such trampolines do not contain function calls, do not contain voluntary
> > > > > > context switches, do not switch to usermode, and do not switch to idle.
> > > > >
> > > > > I'm concerned about the amount of system overhead this introduces.
> > > > > Polling for holdout tasks seems quite excessive. If I understand the
> > > > > intended use case correctly, the users of this will want to free
> > > > > relatively small amounts of memory; thus, waiting a while to do so seems
> > > > > fine, especially if the system isn't under any particular memory
> > > > > pressure.
> > > > >
> > > > > Thus, rather than polling, could you simply flag the holdout
> > > > > tasks, telling the scheduler "hey, next time you don't have anything
> > > > > better to do..."? Then don't bother with them again unless the system
> > > > > runs low on memory and asks you to free some. (And mandate that you can
> > > > > only use this to free memory rather than for any other purpose.)
> > > >
> > > > One of the many of my alternative suggestions that Steven rejected was
> > > > to simply leak the memory. ;-)
> > > >
> > > > But from what I can see, if we simply flag the holdout tasks, we
> > > > either are also holding onto the task_struct structures, re-introducing
> > > > concurrency to the list of holdout tasks, or requiring that the eventual
> > > > scan for holdout tasks scan the entire task list. Neither of these seems
> > > > particularly appetizing to me.
> > > >
> > > > The nice thing about Lai Jiangshan's suggestion is that it allows the
> > > > scan of the holdout list to be done completely unsynchronized, which
> > > > allows pauses during the scan, thus allowing the loop to check for
> > > > competing work on that CPU. This should get almost all the effect
> > > > of indefinite delay without the indefinite delay (at least in the
> > > > common case).
> > > >
> > > > Or am I missing something here?
> > >
> > > If you only allow a single outstanding set of callbacks at a time, you
> > > could have a single flag stored in the task, combined with a count
> > > stored with the set of callbacks. Each time one of the holdout tasks
> > > comes up, clear the flag and decrement the count. If and only if you
> > > get asked to free up memory, start poking the scheduler to bring up
> > > those tasks. When the count hits 0, free the memory.
> > >
> > > The set of trampolines won't change often, and presumably only changes
> > > in response to user-driven requests to trace or stop tracing things.
> > > So, if you have to wait for the existing set of callbacks to go away
> > > before adding more, that seems fine. And you could then ditch polling
> > > entirely.
> >
> > If I understand what you are suggesting, this requires hooks in the
> > scheduler. I used to have hooks in the scheduler, but I dropped them in
> > favor of polling the voluntary context-switch count in response to Peter
> > Zijlstra's concerns about adding overhead to the scheduler's fastpaths.
> >
> > Therefore, although the flags are sometimes cleared externally from the
> > scheduling-clock interrupt (for usermode execution), it is quite possible
> > that a given task might never have its flag cleared asynchronously.
> >
> > Another approach might be to poll more slowly or to make the polling
> > evict itself if it detects that this CPU has something else to do.
> > Would either or both of these help?
>
> As discussed at lunch today, another option would be to drop the thread
> and handle cleanup synchronously from the caller in the tracing code, or
> fire off a kthread *on request* to handle it asynchronously. That would
> avoid paying the startup and overhead on a system that has tracing in
> the kernel but never uses it, as will likely occur with distro kernels.

Good points!

As discussed, I am more inclined towards the second approach than
towards the first, but let me get v3 tested and posted and then see how
things look.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/