Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc
From: Dmitry Vyukov
Date: Wed Feb 03 2016 - 12:33:14 EST
On Wed, Feb 3, 2016 at 5:24 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> Hi Dmitry,
>
> On 01/21/2016 09:43 AM, Peter Hurley wrote:
>> On 01/21/2016 02:06 AM, Dmitry Vyukov wrote:
>>> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>>>> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>>>>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>>>>> -> #3 (&buf->lock){+.+...}:
>>>>>> [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
>>>>>> [< inline >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
>>>>>> [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
>>>>>> [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
>>>>>
>>>>> So in any recent code that I look at this function tries to acquire
>>>>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>>>>
>>>> Yes.
>>>>
>>>> The tty locks were annotated with __lockfunc so were being elided from lockdep
>>>> stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
>>>> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>>>>
>>>> Unfortunately, I think syzkaller's post-processing stack trace isn't helping
>>>> either, giving the impression that the stack is still inside tty_get_pgrp().
>>>>
>>>> It's not.
>>>
>>> I've got a new report on commit
>>> a200dcb34693084e56496960d855afdeaaf9578f (Jan 18).
>>> Here is unprocessed version:
>>> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
>>> and here is processed one:
>>> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>>>
>>> Peter, what exactly is wrong with the post-processed version?
>>
>> Yeah, ok, I assumed the problem with this report was post-processing
>> because of the other report that had mixed-up info.
>>
>> However, the #3 stacktrace is obviously wrong, as others have already noted.
>> Plus, the #1 stacktrace is wrong as well.
>>
>>> I would be interested in fixing the processing script.
>>
>> Not that it's related (since the original, not-edited report has bogus
>> stacktraces), but how are you doing debug symbol lookup?
>>
>> Because below is not correct. Should be kernel/kthread.c:177 (or thereabouts)
>>
>> [<ffffffff813b423f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
>>
>>
>>> As far as I see it contains the same stacks just with line numbers and
>>> inlined frames.
>>
>> Agree, now that I see the original report.
>>
>>> I am using a significantly different compilation mode
>>> (kasan + kcov + very recent gcc), so nobody except me won't be able to
>>> figure out line numbers based on offsets.
>>
>> Weird. Maybe something to do with the compiler.
>>
>> Can you get me the dmesg output running the patch below?
>
> Wondering if this is still the priority it was not so long ago?
> If not, that's fine and I'll drop this from my followup list.
Yes, it is still the priority for me.
I've tried to apply your debugging patch, but I noticed that it prints
dependencies stacks as it discovers them. But in my setup I don't have
all output from machine start (there is just too many of it). And I
don't have a localized reproducer for this.
I will try again.
Do you want me to debug with your "tty: Fix lock inversion in
N_TRACEROUTER" patch applied or not (I still see slightly different
deadlock reports with it)?
>> Please first just get me the output shortly after boot (please do a
>> ctrl+C in a terminal window to trigger the expected stacktrace
>> corresponding to #3 stacktrace from your report).
>>
>> Then please run your test that generates the circular lockdep report.
>>
>> --- >% ---
>> Subject: [PATCH] debug: dump stacktraces for tty lock dependencies
>>
>> ---
>> kernel/locking/lockdep.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>> index 60ace56..b67cafb 100644
>> --- a/kernel/locking/lockdep.c
>> +++ b/kernel/locking/lockdep.c
>> @@ -332,7 +332,7 @@ EXPORT_SYMBOL(lockdep_on);
>> * Debugging switches:
>> */
>>
>> -#define VERBOSE 0
>> +#define VERBOSE 1
>> #define VERY_VERBOSE 0
>>
>> #if VERBOSE
>> @@ -351,13 +351,15 @@ EXPORT_SYMBOL(lockdep_on);
>> */
>> static int class_filter(struct lock_class *class)
>> {
>> -#if 0
>> - /* Example */
>> +#if 1
>> if (class->name_version == 1 &&
>> - !strcmp(class->name, "lockname"))
>> + !strcmp(class->name, "&buf->lock"))
>> return 1;
>> if (class->name_version == 1 &&
>> - !strcmp(class->name, "&struct->lockfield"))
>> + !strcmp(class->name, "&port->buf.lock/1"))
>> + return 1;
>> + if (class->name_version == 1 &&
>> + !strcmp(class->name, "routelock"))
>> return 1;
>> #endif
>> /* Filter everything else. 1 would be to allow everything else */
>>
>