Re: [PATCH v2 tip/core/rcu 0/10] RCU-tasks implementation

From: josh
Date: Thu Jul 31 2014 - 16:58:27 EST


On Thu, Jul 31, 2014 at 11:38:16AM -0700, Paul E. McKenney wrote:
> On Thu, Jul 31, 2014 at 10:20:24AM -0700, josh@xxxxxxxxxxxxxxxx wrote:
> > On Thu, Jul 31, 2014 at 09:58:43AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jul 31, 2014 at 09:19:02AM -0700, josh@xxxxxxxxxxxxxxxx wrote:
> > > > On Wed, Jul 30, 2014 at 05:39:14PM -0700, Paul E. McKenney wrote:
> > > > > This series provides a prototype of an RCU-tasks implementation, which has
> > > > > been requested to assist with tramopoline removal. This flavor of RCU
> > > > > is task-based rather than CPU-based, and has voluntary context switch,
> > > > > usermode execution, and the idle loops as its only quiescent states.
> > > > > This selection of quiescent states ensures that at the end of a grace
> > > > > period, there will no longer be any tasks depending on a trampoline that
> > > > > was removed before the beginning of that grace period. This works because
> > > > > such trampolines do not contain function calls, do not contain voluntary
> > > > > context switches, do not switch to usermode, and do not switch to idle.
> > > >
> > > > I'm concerned about the amount of system overhead this introduces.
> > > > Polling for holdout tasks seems quite excessive. If I understand the
> > > > intended use case correctly, the users of this will want to free
> > > > relatively small amounts of memory; thus, waiting a while to do so seems
> > > > fine, especially if the system isn't under any particular memory
> > > > pressure.
> > > >
> > > > Thus, rather than polling, could you simply flag the holdout
> > > > tasks, telling the scheduler "hey, next time you don't have anything
> > > > better to do..."? Then don't bother with them again unless the system
> > > > runs low on memory and asks you to free some. (And mandate that you can
> > > > only use this to free memory rather than for any other purpose.)
> > >
> > > One of the many of my alternative suggestions that Steven rejected was
> > > to simply leak the memory. ;-)
> > >
> > > But from what I can see, if we simply flag the holdout tasks, we
> > > either are also holding onto the task_struct structures, re-introducing
> > > concurrency to the list of holdout tasks, or requiring that the eventual
> > > scan for holdout tasks scan the entire task list. Neither of these seems
> > > particularly appetizing to me.
> > >
> > > The nice thing about Lai Jiangshan's suggestion is that it allows the
> > > scan of the holdout list to be done completely unsynchronized, which
> > > allows pauses during the scan, thus allowing the loop to check for
> > > competing work on that CPU. This should get almost all the effect
> > > of indefinite delay without the indefinite delay (at least in the
> > > common case).
> > >
> > > Or am I missing something here?
> >
> > If you only allow a single outstanding set of callbacks at a time, you
> > could have a single flag stored in the task, combined with a count
> > stored with the set of callbacks. Each time one of the holdout tasks
> > comes up, clear the flag and decrement the count. If and only if you
> > get asked to free up memory, start poking the scheduler to bring up
> > those tasks. When the count hits 0, free the memory.
> >
> > The set of trampolines won't change often, and presumably only changes
> > in response to user-driven requests to trace or stop tracing things.
> > So, if you have to wait for the existing set of callbacks to go away
> > before adding more, that seems fine. And you could then ditch polling
> > entirely.
>
> If I understand what you are suggesting, this requires hooks in the
> scheduler. I used to have hooks in the scheduler, but I dropped them in
> favor of polling the voluntary context-switch count in response to Peter
> Zijlstra's concerns about adding overhead to the scheduler's fastpaths.
>
> Therefore, although the flags are sometimes cleared externally from the
> scheduling-clock interrupt (for usermode execution), it is quite possible
> that a given task might never have its flag cleared asynchronously.
>
> Another approach might be to poll more slowly or to make the polling
> evict itself if it detects that this CPU has something else to do.
> Would either or both of these help?

As discussed at lunch today, another option would be to drop the thread
and handle cleanup synchronously from the caller in the tracing code, or
fire off a kthread *on request* to handle it asynchronously. That would
avoid paying the startup and overhead on a system that has tracing in
the kernel but never uses it, as will likely occur with distro kernels.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/