Re: workqueue: WARN at at kernel/workqueue.c:2176

From: Jason J. Herne
Date: Mon Jun 09 2014 - 10:01:48 EST


On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
------------

Subject: [PATCH] sched: migrate the waking tasks

Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.

When a task is waking, it is pending on the wake_list of the rq, but
it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
and __migrate_task() will not migrate it due to it is not on queue.

This behavior is incorrect, because the task had been already waken-up, it will
be running on the wrong CPU without correct placement until the next wake-up
or update for cpus_allowed.

To fix this problem, we need to make the waking tasks on-queue (transfer
the waking tasks to running state) before migrate them.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..d05a5a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
}

#ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+static void sched_ttwu_pending_locked(struct rq *rq)
{
- struct rq *rq = this_rq();
struct llist_node *llist = llist_del_all(&rq->wake_list);
struct task_struct *p;

- raw_spin_lock(&rq->lock);
-
while (llist) {
p = llist_entry(llist, struct task_struct, wake_entry);
llist = llist_next(llist);
ttwu_do_activate(rq, p, 0);
}
+}

+static void sched_ttwu_pending(void)
+{
+ struct rq *rq = this_rq();
+
+ raw_spin_lock(&rq->lock);
+ sched_ttwu_pending_locked(rq);
raw_spin_unlock(&rq->lock);
}

@@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
goto out;

dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+
+ /* Ensure it is on rq for migration if it is waking */
+ if (p->state == TASK_WAKING)
+ sched_ttwu_pending_locked(rq);
+
if (p->on_rq) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
@@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
goto fail;

+ /* Ensure it is on rq for migration if it is waking */
+ if (p->state == TASK_WAKING)
+ sched_ttwu_pending_locked(rq_src);
+
/*
* If we're not on a rq, the next wake-up will ensure we're
* placed properly.


FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.

I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.

Happy coding.

--
-- Jason J. Herne (jjherne@xxxxxxxxxxxxxxxxxx)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/