Re: [RFC v2 00/18] kthread: Use kthread worker API more widely

From: Paul E. McKenney
Date: Wed Sep 30 2015 - 01:08:45 EST

On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote:
> My intention is to make it easier to manipulate kthreads. This RFC tries
> to use the kthread worker API. It is based on comments from the
> first attempt. See and
> the list of changes below.
> 1st..8th patches: improve the existing kthread worker API
> 9th, 12th, 17th patches: convert three kthreads into the new API,
> namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*]
> 10th, 11th patches: fix potential problems in the ring buffer
> benchmark; also sent separately
> 13th patch: small fix for RCU kthread; also sent separately;
> being tested by Paul
> 14th..16th patches: preparation steps for the RCU threads
> conversion; they are needed _only_ if we split GP start
> and QS handling into separate works[*]
> 18th patch: does a possible improvement of the kthread worker API;
> it adds an extra parameter to the create*() functions, so I
> rather put it into this draft
> [*] IMPORTANT: I tried to split RCU GP start and GS state handling
> into separate works this time. But there is a problem with
> a race in rcu_gp_kthread_worker_poke(). It might queue
> the wrong work. It can be detected and fixed by the work
> itself but it is a bit ugly. Alternative solution is to
> do both operations in one work. But then we sleep too much
> in the work which is ugly as well. Any idea is appreciated.

I think that the kernel is trying really hard to tell you that splitting
up the RCU grace-period kthreads in this manner is not such a good idea.

So what are we really trying to accomplish here? I am guessing something
like the following:

1. Get each grace-period kthread to a known safe state within a
short time of having requested a safe state. If I recall
correctly, the point of this is to allow no-downtime kernel
patches to the functions executed by the grace-period kthreads.

2. At the same time, if someone suddenly needs a grace period
at some point in this process, the grace period kthreads are
going to have to wake back up and handle the grace period.
Or do you have some tricky way to guarantee that no one is
going to need a grace period beyond the time you freeze
the grace-period kthreads?

3. The boost kthreads should not be a big problem because failing
to boost simply lets the grace period run longer.

4. The callback-offload kthreads are likely to be a big problem,
because in systems configured with them, they need to be running
to invoke the callbacks, and if the callbacks are not invoked,
the grace period might just as well have failed to end.

5. The per-CPU kthreads are in the same boat as the callback-offload
kthreads. One approach is to offline all the CPUs but one, and
that will park all but the last per-CPU kthread. But handling
that last per-CPU kthread would likely be "good clean fun"...

6. Other requirements?

One approach would be to simply say that the top-level rcu_gp_kthread()
function cannot be patched, and arrange for the grace-period kthreads
to park at some point within this function. Or is there some requirement
that I am missing?

Thanx, Paul

> Changes against v1:
> + remove wrappers to manipulate the scheduling policy and priority
> + remove questionable wakeup_and_destroy_kthread_worker() variant
> + do not check for chained work when draining the queue
> + allocate struct kthread worker in create_kthread_work() and
> use more simple checks for running worker
> + add support for delayed kthread works and use them instead
> of waiting inside the works
> + rework the "unrelated" fixes for the ring buffer benchmark
> as discussed in the 1st RFC; also sent separately
> + convert also the consumer in the ring buffer benchmark
> I have tested this patch set against the stable Linus tree
> for 4.3-rc2.
> Petr Mladek (18):
> kthread: Allow to call __kthread_create_on_node() with va_list args
> kthread: Add create_kthread_worker*()
> kthread: Add drain_kthread_worker()
> kthread: Add destroy_kthread_worker()
> kthread: Add pending flag to kthread work
> kthread: Initial support for delayed kthread work
> kthread: Allow to cancel kthread work
> kthread: Allow to modify delayed kthread work
> mm/huge_page: Convert khugepaged() into kthread worker API
> ring_buffer: Do no not complete benchmark reader too early
> ring_buffer: Fix more races when terminating the producer in the
> benchmark
> ring_buffer: Convert benchmark kthreads into kthread worker API
> rcu: Finish folding ->fqs_state into ->gp_state
> rcu: Store first_gp_fqs into struct rcu_state
> rcu: Clean up timeouts for forcing the quiescent state
> rcu: Check actual RCU_GP_FLAG_FQS when handling quiescent state
> rcu: Convert RCU gp kthreads into kthread worker API
> kthread: Better support freezable kthread workers
> include/linux/kthread.h | 67 +++++
> kernel/kthread.c | 544 ++++++++++++++++++++++++++++++++---
> kernel/rcu/tree.c | 407 ++++++++++++++++----------
> kernel/rcu/tree.h | 24 +-
> kernel/rcu/tree_plugin.h | 16 +-
> kernel/rcu/tree_trace.c | 2 +-
> kernel/trace/ring_buffer_benchmark.c | 194 ++++++-------
> mm/huge_memory.c | 116 ++++----
> 8 files changed, 1017 insertions(+), 353 deletions(-)
> --

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at