Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

From: David Teigland
Date: Mon Oct 09 2017 - 10:37:47 EST


On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> > dlm_send_rcom_lookup_dump
> > create_rcom
> > dlm_lowcomms_get_buffer
> > nodeid2con
> > mutex_lock --> may sleep
> >
> > This bug is found by my static analysis tool and my code review.
>
> Umm... dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
>
> * from_nodeid != our_nodeid (or we would've buggered off long before that point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
>
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
> log_error(ls, "from_master %d our_master", from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed. It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did. I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking. I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave