Re: Use case for TASKS_RCU
From: Paul E. McKenney
Date: Tue May 23 2017 - 11:33:23 EST
On Tue, May 23, 2017 at 01:19:58AM -0400, Steven Rostedt wrote:
> On Mon, 22 May 2017 17:00:36 -0700
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote:
> > > On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:
> > > > On Fri, 19 May 2017 10:04:21 -0400
> > > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > > >
> > > > > On Fri, 19 May 2017 06:35:50 -0700
> > > > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > Simpler would be better!
> > > > > >
> > > > > > However, is it really guaranteed that one SCHED_IDLE thread
> > > > > > cannot preempt another? If not, then the trampoline-freeing
> > > > > > SCHED_IDLE thread might preempt some other SCHED_IDLE thread
> > > > > > in the middle of a trampoline. I am not seeing anything that
> > > > > > prevents such preemption, but it is rather early local time,
> > > > > > so I could easily be missing something.
> > > > > >
> > > > > > However, if SCHED_IDLE threads cannot preempt other threads,
> > > > > > even other SCHED_IDLE threads, then your approach sounds
> > > > > > quite promising to me.
> > > > > >
> > > > > > Steve, Peter, thoughts?
> > > > >
> > > > > SCHED_IDLE is the swapper task. There's one on each CPU, and
> > > > > they don't migrate. And they only get called when there's no
> > > > > other task running.
> > > >
> > > > Peter just "schooled" me on IRC. I stand corrected (and he may
> > > > respond to this email too). I guess any task can become
> > > > SCHED_IDLE.
> > > >
> > > > But that just makes this an even less likely option for
> > > > synchronize_rcu_tasks().
> > >
> > > Hmmm... The goal is to make sure that any task that was preempted
> > > or running at a given point in time passes through a voluntary
> > > context switch (or userspace execution, or, ...).
> > >
> > > What is the simplest way to get this job done? To Ingo's point, I
> > > bet that there is a simpler way than the current TASKS_RCU
> > > implementation.
> > >
> > > Ingo, if I make it fit into 100 lines of code, would you be OK with
> > > it? I probably need a one-line hook at task-creation time and
> > > another at task-exit time, if that makes a difference.
> >
> > And please see below for such a patch, which does add (just barely)
> > fewer than 100 lines net.
> >
> > Unfortunately, it does not work, as I should have known ahead of time
> > from the dyntick-idle experience. Not all context switches go through
> > context_switch(). :-/
> >
> > I believe this is fixable, more or less like dyntick-idle's
> > half-interrupts were fixable, but it will likely be a few days. Not
> > clear whether the result will be simpler than current TASKS_RCU, but
> > there is only one way to find out. ;-)
>
> Hi Paul,
>
> I'm currently traveling and don't have the energy to look at code at
> the moment. Hopefully, I can look at this more on Thursday or Friday.
Hello, Steven,
No problem, as I learned that my analogy to dyntick-idle's half-interrupts
fails. If a given task exits an extended quiescent twice in a row, the
grace period was extended longer than it had to be, but no harm done.
The updated patch below handles this case.
But it turns out that tasks also enter extended quiescent states twice
in a row (from the perspective of context_switch()), which can result
in a too-short grace period.
So the only way that this general approach will work is to identify each
and every context-switch "sneak path" in each and every architecture,
which will be more complex than the current mainline RCU-tasks code.
So I am putting this on the back burner for the moment. If anyone has
any ideas for a simple solution, please don't keep them a secret!
Thanx, Paul
-----------------------------------------------------------------------
commit 71f0a84ceeb228789a09192901ddf3c7210c85ee
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Mon May 22 08:55:01 2017 -0700
rcu: Provide simplified RCU-tasks implementation
This commit provides a simplified RCU-tasks implementation in terms of
SRCU for CONFIG_PREEMPT=y kernels and in terms of RCU-sched otherwise.
The RCU-sched approach simply maps synchronize_rcu_tasks() directly onto
synchronize_sched().
The SRCU approach provides a tasks_srcu srcu_struct structure that
tracks usermode execution and voluntary context switches. When each
task is created, it invokes tasks_rcu_qs_exit() and when each task exits
it invokes tasks_rcu_qs_enter(). On context switch, tasks_rcu_qs_exit()
is invoked for the incoming task, but only if that task is resuming from a
voluntary context switch. If the current context switch is a preemption,
tasks_rcu_qs_enter() is invoked for the outgoing tasks, and either way
the preempt-or-not state is recorded for the task's next resumption.
A "context switch" that keeps the same task invokes tasks_rcu_qs().
If the scheduling-clock interrupt sees a task in usermode, it invokes
tasks_rcu_qs(). It is normally unsafe to do SRCU read-side operations
from interrupt, but in this case, usermode was interrupted, so there is
no chance of having interrupted another SRCU read-side operation.
Given all this, synchronize_srcu(&tasks_srcu) waits for all tasks to
be seen voluntarily blocked or in user space. This means that
synchronize_rcu_tasks() is a one-line function.
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index bd11d33d48cc..d7700d969b8c 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -140,6 +140,14 @@ extern struct group_info init_groups;
#else
#define INIT_TASK_RCU_PREEMPT(tsk)
#endif
+#ifdef CONFIG_TASKS_RCU
+#define INIT_TASK_RCU_TASKS(tsk) \
+ .rcu_tasks_idx = 0, \
+ .rcu_tasks_preempt = false, \
+ .rcu_tasks_qs = false,
+#else /* #ifdef CONFIG_TASKS_RCU */
+#define INIT_TASK_RCU_TASKS(tsk)
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
extern struct cred init_cred;
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5088a44ce99c..1b2f104a467f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -156,6 +156,18 @@ static inline void rcu_init_nohz(void) { }
rcu_irq_exit_irqson(); \
} while (0)
+#ifdef CONFIG_TASKS_RCU
+void tasks_rcu_qs(bool preempt);
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt);
+void tasks_rcu_qs_exit(struct task_struct *t);
+void synchronize_rcu_tasks(void);
+#else /* #ifdef CONFIG_TASKS_RCU */
+static inline void tasks_rcu_qs(bool preempt) { }
+static inline void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) { }
+static inline void tasks_rcu_qs_exit(struct task_struct *t) { }
+static inline void synchronize_rcu_tasks(void) { synchronize_sched(); }
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
/**
* cond_resched_rcu_qs - Report potential quiescent states to RCU
*
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b7d9aa80349..2c1862c9e7fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,12 @@ struct task_struct {
struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+ u8 rcu_tasks_idx;
+ u8 rcu_tasks_preempt;
+ u8 rcu_tasks_qs;
+#endif /* #ifdef CONFIG_TASKS_RCU */
+
struct sched_info sched_info;
struct list_head tasks;
diff --git a/kernel/fork.c b/kernel/fork.c
index 1784f49b0cd4..d1ec804eb4f9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1498,6 +1498,11 @@ static inline void rcu_copy_process(struct task_struct *p)
p->rcu_blocked_node = NULL;
INIT_LIST_HEAD(&p->rcu_node_entry);
#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+ p->rcu_tasks_preempt = false;
+ p->rcu_tasks_qs = true;
+ tasks_rcu_qs_exit(p);
+#endif /* #ifdef CONFIG_TASKS_RCU */
}
/*
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index be90c945063f..3077568930f3 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -69,13 +69,13 @@ config TREE_SRCU
This option selects the full-fledged version of SRCU.
config TASKS_RCU
- bool
- default n
+ def_bool PREEMPT_RCU
select SRCU
help
This option enables a task-based RCU implementation that uses
- only voluntary context switch (not preemption!), idle, and
- user-mode execution as quiescent states.
+ only voluntary context switch (not preemption!) and user-mode
+ execution as quiescent states. For !PREEMPT_RCU kernels,
+ synchronize_tasks_rcu() maps to synchronize_sched().
config RCU_STALL_COMMON
def_bool ( TREE_RCU || PREEMPT_RCU )
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..b6619ceedafc 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -16,7 +16,6 @@ config RCU_PERF_TEST
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
- select TASKS_RCU
default n
help
This option provides a kernel module that runs performance
@@ -33,7 +32,6 @@ config RCU_TORTURE_TEST
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
- select TASKS_RCU
default n
help
This option provides a kernel module that runs torture tests
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8008541b04f2..05e7abecde68 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -166,6 +166,7 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
static void rcu_report_exp_rdp(struct rcu_state *rsp,
struct rcu_data *rdp, bool wake);
static void sync_sched_exp_online_cleanup(int cpu);
+static void rcu_all_qs_ctxt(bool ctxt);
/* rcuc/rcub kthread realtime priority */
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
@@ -467,7 +468,7 @@ void rcu_note_context_switch(bool preempt)
rcu_momentary_dyntick_idle();
this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
if (!preempt)
- rcu_all_qs();
+ rcu_all_qs_ctxt(true);
out:
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
@@ -480,14 +481,13 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
* dyntick-idle quiescent state visible to other CPUs (but only for those
* RCU flavors in desperate need of a quiescent state, which will normally
* be none of them). Either way, do a lightweight quiescent state for
- * all RCU flavors.
+ * all RCU flavors. If ctxt is false, also to tasks-RCU quiescent state.
*
* The barrier() calls are redundant in the common case when this is
* called externally, but just in case this is called from within this
* file.
- *
*/
-void rcu_all_qs(void)
+static void rcu_all_qs_ctxt(bool ctxt)
{
unsigned long flags;
@@ -509,9 +509,16 @@ void rcu_all_qs(void)
if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)))
rcu_sched_qs();
this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
+ if (!ctxt)
+ tasks_rcu_qs(false);
barrier(); /* Avoid RCU read-side critical sections leaking up. */
preempt_enable();
}
+
+void rcu_all_qs(void)
+{
+ rcu_all_qs_ctxt(false);
+}
EXPORT_SYMBOL_GPL(rcu_all_qs);
#define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d7325553cf82..454882ed03c5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -783,6 +783,7 @@ void exit_rcu(void)
barrier();
t->rcu_read_unlock_special.b.blocked = true;
__rcu_read_unlock();
+ tasks_rcu_qs_enter(t, false);
}
#else /* #ifdef CONFIG_PREEMPT_RCU */
@@ -2597,3 +2598,63 @@ static void rcu_bind_gp_kthread(void)
return;
housekeeping_affine(current);
}
+
+#ifdef CONFIG_TASKS_RCU
+
+DEFINE_STATIC_SRCU(tasks_srcu);
+
+/* Record a momentary RCU-tasks quiescent state. */
+void tasks_rcu_qs(bool preempt)
+{
+ int idx;
+
+ if (preempt || is_idle_task(current))
+ return; /* Preempted, not a quiescent state. */
+ /* Acquire new before releasing old to allow tracing SRCU. */
+ preempt_disable();
+ idx = __srcu_read_lock(&tasks_srcu);
+ __srcu_read_unlock(&tasks_srcu, current->rcu_tasks_idx);
+ current->rcu_tasks_idx = idx;
+ WARN_ON_ONCE(current->rcu_tasks_qs);
+ current->rcu_tasks_qs = false;
+ preempt_enable();
+}
+
+/* Record entering an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt)
+{
+ t->rcu_tasks_preempt = preempt;
+ if (preempt || is_idle_task(t))
+ return; /* Preempted, not a quiescent state. */
+ preempt_disable();
+ __srcu_read_unlock(&tasks_srcu, t->rcu_tasks_idx);
+ WARN_ON_ONCE(current->rcu_tasks_qs);
+ current->rcu_tasks_qs = true;
+ preempt_enable();
+}
+
+/* Record exiting an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_exit(struct task_struct *t)
+{
+ if (t->rcu_tasks_preempt || is_idle_task(t))
+ return; /* Preempted, not a quiescent state. */
+ preempt_disable();
+ if (current->rcu_tasks_qs)
+ t->rcu_tasks_idx = __srcu_read_lock(&tasks_srcu);
+ current->rcu_tasks_qs = false;
+ preempt_enable();
+}
+
+/**
+ * synchronize_rcu_tasks - Wait for all tasks to voluntarily leave the kernel
+ *
+ * Waits until each task is either voluntarily blocked, running in
+ * userspace, or has offered to block (as in cond_resched_rcu_qs()).
+ * Idle tasks are ignored.
+ */
+void synchronize_rcu_tasks(void)
+{
+ synchronize_srcu(&tasks_srcu);
+}
+
+#endif /* #ifdef CONFIG_TASKS_RCU */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75353aa53ae3..1ba79734377f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3430,6 +3430,7 @@ static void __sched notrace __schedule(bool preempt)
clear_preempt_need_resched();
if (likely(prev != next)) {
+ tasks_rcu_qs_exit(next);
rq->nr_switches++;
rq->curr = next;
++*switch_count;
@@ -3438,9 +3439,11 @@ static void __sched notrace __schedule(bool preempt)
/* Also unlocks the rq: */
rq = context_switch(rq, prev, next, &rf);
+ tasks_rcu_qs_enter(prev, preempt);
} else {
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
rq_unlock_irq(rq, &rf);
+ tasks_rcu_qs(preempt);
}
balance_callback(rq);