Re: frequent lockups in 3.18rc4

From: Jens Axboe
Date: Mon Nov 17 2014 - 17:58:39 EST


On 11/17/2014 03:43 PM, Thomas Gleixner wrote:
> On Mon, 17 Nov 2014, Thomas Gleixner wrote:
>> On Mon, 17 Nov 2014, Linus Torvalds wrote:
>>> llist_for_each_entry_safe(csd, csd_next, entry, llist) {
>>> - csd->func(csd->info);
>>> + smp_call_func_t func = csd->func;
>>> + void *info = csd->info;
>>> csd_unlock(csd);
>>> +
>>> + func(info);
>>
>> No, that won't work for synchronous calls:
>>
>> CPU 0 CPU 1
>>
>> csd_lock(csd);
>> queue_csd();
>> ipi();
>> func = csd->func;
>> info = csd->info;
>> csd_unlock(csd);
>> csd_lock_wait();
>> func(info);
>>
>> The csd_lock_wait() side will succeed and therefor assume that the
>> call has been completed while the function has not been called at
>> all. Interesting explosions to follow.
>>
>> The proper solution is to revert that commit and properly analyze the
>> problem which Jens was trying to solve and work from there.
>
> So a combo of both (Jens and yours) might do the trick. Patch below.
>
> I think what Jens was trying to solve is:
>
> CPU 0 CPU 1
>
> csd_lock(csd);
> queue_csd();
> ipi();
> csd->func(csd->info);
> wait_for_completion(csd);
> complete(csd);
> reuse_csd(csd);
> csd_unlock(csd);

Maybe... The above looks ok to me from a functional point of view, but
now I can't convince myself that the blk-mq use case is correct.

I'll try and backout the original patch and reproduce the issue, that
should jog my memory and give me full understanding of what the issue I
faced back then was.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/