Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

From: David Fries
Date: Tue May 06 2014 - 19:49:36 EST


On Wed, May 07, 2014 at 01:42:12AM +0400, Alexey Khoroshilov wrote:
> On 01.05.2014 07:48, David Fries wrote:
> > On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote:
> >> w1_process_callbacks() expects to be called with dev->list_mutex held,
> >> but it is the fact only in w1_process(). __w1_remove_master_device()
> >> calls w1_process_callbacks() after it releases list_mutex.
> >>
> >> The patch fixes __w1_remove_master_device() to acquire list_mutex
> >> for w1_process_callbacks(). It is implemented by moving
> >> w1_process_callbacks() functionality to __w1_process_callbacks()
> >> and adding a wrapper with the old name.
> > Good catch. I guess as it was in the shutdown path it failed the
> > unlock silently.
> >
> > I would prefer a different fix. If w1_process_callbacks was more of a
> > public facing API called from multiple locations where the caller
> > doesn't have access to the locks, something like this would probably
> > be the way to go. This is called from two areas of the code, and not
> > likely to be called from any new areas in the future.
> >
> > Adding a documentation comment is good. I would be tempted in
> > __w1_remove_master_device to move the dev->list_mutex down after the
> > while loop, but I can't be sure that whatever has a refcnt wouldn't
> > need list_mutex. The last w1_process_callbacks after the while loop
> > shouldn't be needed I wouldn't think, I was just being defensive.
> I do not understand all the details, but I am not sure.
> refcnt can became 0, but that does not mean all callbacks have been
> processed because they can be added before refcnt decrement but after
> previous w1_process_callbacks invocation.
> > By
> > the time it completes the while loop the reference count is 0 so there
> > shouldn't be anything left for it to process and if there is, it's a
> > race condition and game over anyway. So just sandwich
> > w1_process_callbacks with the lock/unlock in
> > __w1_remove_master_device please.
> Done.

Thanks for finding and fixing this bug.

> Another suspicious question for me is how safe is it to call
> wake_up_process(dev->thread);
> after
> kthread_stop(dev->thread);
> ?

Looks bad, but I would need to convince myself it could actually be a
problem.
int wake_up_process(struct task_struct *p)
{
WARN_ON(task_is_stopped_or_traced(p));
return try_to_wake_up(p, TASK_NORMAL, 0);
}

But I think there are lower hanging fruit, like revisiting all the
locks in all the code and testing with the lock debugging code on and
get it all to pass or document that it really is correct. Currently
it is detecting a locking order problem, but both code paths can only
be called from the w1_process function, which can only be called by
one thread, and one thread can't dead lock itself with a locking order
problem.

> >> Found by Linux Driver Verification project (linuxtesting.org).
> > Was this found with code inspection or hardware testing with lock
> > debugging enabled?
> It was found by static verification tools developed within Linux Driver
> Verification project.
>
>
> Best regards,
> Alexey

--
David Fries <david@xxxxxxxxx> PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/