Re: Filesystem lockup with CONFIG_PREEMPT_RT

From: Austin Schuh
Date: Mon Jun 30 2014 - 20:53:55 EST


On Mon, Jun 30, 2014 at 5:12 PM, Austin Schuh <austin@xxxxxxxxxxxxxxxx> wrote:
> On Fri, Jun 27, 2014 at 7:24 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> On Thu, 26 Jun 2014, Austin Schuh wrote:
>>> If I'm reading the rt patch correctly, wq_worker_sleeping was moved
>>> out of __schedule to sched_submit_work. It looks like that changes
>>> the conditions under which wq_worker_sleeping is called. It used to
>>> be called whenever a task was going to sleep (I think). It looks like
>>> it is called now if the task is going to sleep, and if the task isn't
>>> blocked on a PI mutex (I think).
>>>
>>> If I try the following experiment
>>>
>>> static inline void sched_submit_work(struct task_struct *tsk)
>>> {
>>> + if (tsk->state && tsk->flags & PF_WQ_WORKER) {
>>> + wq_worker_sleeping(tsk);
>>> + return;
>>> + }
>>>
>>> and then remove the call later in the function, I am able to pass my test.
>>>
>>> Unfortunately, I then get a recursive pool spinlock BUG_ON after a
>>> while (as I would expect), and it all blows up.
>>
>> Well, that's why the check for !pi_blocked_on is there.
>>
>>> I'm not sure where to go from there. Any changes to the workpool to
>>> try to fix that will be hard, or could affect latency significantly.
>>
>> Completely untested patch below.
>>
>> Thanks,
>>
>> tglx
>>
>> --------------------->
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 8749d20..621329a 100644
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index be0ef50..0ca9d97 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -849,25 +882,18 @@ void wq_worker_sleeping(struct task_struct *task)
>> return;
>>
>> worker->sleeping = 1;
>> - spin_lock_irq(&pool->lock);
>> +
>> /*
>> * The counterpart of the following dec_and_test, implied mb,
>> * worklist not empty test sequence is in insert_work().
>> * Please read comment there.
>> - *
>> - * NOT_RUNNING is clear. This means that we're bound to and
>> - * running on the local cpu w/ rq lock held and preemption
>> - * disabled, which in turn means that none else could be
>> - * manipulating idle_list, so dereferencing idle_list without pool
>> - * lock is safe.
>> */
>> if (atomic_dec_and_test(&pool->nr_running) &&
>> !list_empty(&pool->worklist)) {
>
> What guarantees that this pool->worklist access is safe? Preemption
> isn't disabled.

I think I might have an answer for my own question, but I would
appreciate someone else to double check. If list_empty erroneously
returns that there is work to do when there isn't work to do, we wake
up an extra worker which then goes back to sleep. Not a big loss. If
list_empty erroneously returns that there isn't work to do when there
is, this should only be because someone is modifying the work list.
When they finish, as far as I can tell, all callers then check to see
if a worker needs to be started up, and start one.

Austin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/