Re: workqueue: WARN at at kernel/workqueue.c:2176

From: Lai Jiangshan
Date: Sun Jun 15 2014 - 21:27:06 EST


Hi, Peter

Ping...

thanks,
Lai

On 06/10/2014 09:21 AM, Lai Jiangshan wrote:
> On 06/09/2014 10:01 PM, Jason J. Herne wrote:
>> On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
>>> ------------
>>>
>>> Subject: [PATCH] sched: migrate the waking tasks
>>>
>>> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>>>
>>> When a task is waking, it is pending on the wake_list of the rq, but
>>> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
>>> and __migrate_task() will not migrate it due to it is not on queue.
>>>
>>> This behavior is incorrect, because the task had been already waken-up, it will
>>> be running on the wrong CPU without correct placement until the next wake-up
>>> or update for cpus_allowed.
>>>
>>> To fix this problem, we need to make the waking tasks on-queue (transfer
>>> the waking tasks to running state) before migrate them.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
>>> ---
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 268a45e..d05a5a1 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>> }
>>>
>>> #ifdef CONFIG_SMP
>>> -static void sched_ttwu_pending(void)
>>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>> {
>>> - struct rq *rq = this_rq();
>>> struct llist_node *llist = llist_del_all(&rq->wake_list);
>>> struct task_struct *p;
>>>
>>> - raw_spin_lock(&rq->lock);
>>> -
>>> while (llist) {
>>> p = llist_entry(llist, struct task_struct, wake_entry);
>>> llist = llist_next(llist);
>>> ttwu_do_activate(rq, p, 0);
>>> }
>>> +}
>>>
>>> +static void sched_ttwu_pending(void)
>>> +{
>>> + struct rq *rq = this_rq();
>>> +
>>> + raw_spin_lock(&rq->lock);
>>> + sched_ttwu_pending_locked(rq);
>>> raw_spin_unlock(&rq->lock);
>>> }
>>>
>>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>> goto out;
>>>
>>> dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>>> +
>>> + /* Ensure it is on rq for migration if it is waking */
>>> + if (p->state == TASK_WAKING)
>>> + sched_ttwu_pending_locked(rq);
>>> +
>>> if (p->on_rq) {
>>> struct migration_arg arg = { p, dest_cpu };
>>> /* Need help from migration thread: drop lock and wait. */
>>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>> if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>> goto fail;
>>>
>>> + /* Ensure it is on rq for migration if it is waking */
>>> + if (p->state == TASK_WAKING)
>>> + sched_ttwu_pending_locked(rq_src);
>>> +
>>> /*
>>> * If we're not on a rq, the next wake-up will ensure we're
>>> * placed properly.
>>>
>>
>> FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.
>
> Thank you for the test. It proves that we found the root cause.
> Your tests are the most important, coding takes the second place, let it go forward step by step.
>
> Thanks,
> Lai
>
>>
>> I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.
>>
>> Happy coding.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/