Re: USB device cannot be reconnected and khubd "blocked for morethan 120 seconds"

From: Tejun Heo
Date: Wed Jan 16 2013 - 11:13:08 EST


Hello, Alan.

On Tue, Jan 15, 2013 at 11:01:15PM -0500, Alan Stern wrote:
> > The current domain implementation is somewhere inbetween. It's not
> > completely simplistic system and at the same time not developed enough
> > to do properly stacked flushing.
>
> I like your idea of chronological synchronization: Insist that anybody
> who wants to flush async jobs must get a cookie, and then only allow
> them to wait for async jobs started after the cookie was issued.
>
> I don't know if this is possible with the current implementation. It
> would require changing every call to async_synchronize_*(), and in a
> nontrivial way. But it might provide a proper solution to all these
> problems.

The problem here is that "flush everything which comes before me" is
used to order async jobs. e.g. after async jobs probe the hardware
they order themselves by flushing before registering them, so unless
we build accurate flushing dependencies, those dependencies will reach
beyond the time window we're interested in and bring in deadlocks.

And, as Linus pointed it out, tracking dependency through
request_module() is tricky no matter what we do. I think it can be
done by matching the ones calling request_module() and the ones
actually loading modules but it's gonna be nasty.

There aren't too many which use async anyway so changing stuff
shouldn't be too difficult but I think the simpicity or dumbness is
one of major attractions of async, so it'd be nice to keep things that
way and the PF_USED_ASYNC hack seems to be able to hold things
together for now.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/