Re: fs: locks: WARNING: CPU: 16 PID: 4296 at fs/locks.c:236 locks_free_lock_context+0x10d/0x240()

From: Sasha Levin
Date: Fri Jan 16 2015 - 16:20:24 EST


On 01/16/2015 04:16 PM, Jeff Layton wrote:
> On Fri, 16 Jan 2015 13:53:04 -0500
> Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
>
>> On Fri, 16 Jan 2015 13:10:46 -0500
>> Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>>
>>> On 01/16/2015 09:40 AM, Jeff Layton wrote:
>>>> On Fri, 16 Jan 2015 09:31:23 -0500
>>>> Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>>>>
>>>>> On 01/15/2015 03:22 PM, Jeff Layton wrote:
>>>>>> Ok, I tried to reproduce it with that and several variations but it
>>>>>> still doesn't seem to do it for me. Can you try the latest linux-next
>>>>>> tree and see if it's still reproducible there?
>>>>>
>>>>> It's still not in in today's -next, could you send me a patch for testing
>>>>> instead?
>>>>>
>>>>
>>>> Seems to be there for me:
>>>>
>>>> ----------------------[snip]-----------------------
>>>> /*
>>>> * This function is called on the last close of an open file.
>>>> */
>>>> void locks_remove_file(struct file *filp)
>>>> {
>>>> /* ensure that we see any assignment of i_flctx */
>>>> smp_rmb();
>>>>
>>>> /* remove any OFD locks */
>>>> locks_remove_posix(filp, filp);
>>>> ----------------------[snip]-----------------------
>>>>
>>>> That's actually the right place to put the barrier, I think. We just
>>>> need to ensure that this function sees any assignment to i_flctx that
>>>> occurred before this point. By the time we're here, we shouldn't be
>>>> getting any new locks that matter to this close since the fcheck call
>>>> should fail on any new requests.
>>>>
>>>> If that works, then I'll probably make some other changes to the set
>>>> and re-post it next week.
>>>>
>>>> Many thanks for helping me test this!
>>>
>>> You're right, I somehow missed that.
>>>
>>> But it doesn't fix the issue, I still see it happening, but it seems
>>> to be less frequent(?).
>>>
>>
>> Ok, that was my worry (and one of the reasons I really would like to
>> find some way to reproduce this on my own). I think what I'll do at
>> this point is pull the patchset from linux-next until I can consult
>> with someone who understands this sort of cache-coherency problem
>> better than I do.
>>
>> Once I get it resolved, I'll push it back to my linux-next branch and
>> let you know and we can give it another go.
>>
>> Thanks for the testing so far!
>
> Actually, I take it back. One more try...
>
> I dragooned David Howells into helping me look at this and he talked me
> into just going back to using the i_lock to protect the i_flctx
> assignment.
>
> My hope is that will work around whatever strange effect is causing
> this. Can you test tomorrow's -next tree (once it's been merged) and see
> whether this is still reproducible?

Sure. You can also feel free to send patches my way to test/debug, it's
pretty easy to throw them into my test setup.


Thanks,
Sasha


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/