Re: USB device cannot be reconnected and khubd "blocked for morethan 120 seconds"

From: Tejun Heo
Date: Tue Jan 15 2013 - 18:50:35 EST


cc'ing Arjan. Arjan, the original thread can be read from

http://thread.gmane.org/gmane.linux.kernel/1420814

Hello, again.

On Tue, Jan 15, 2013 at 12:18:01PM -0800, Linus Torvalds wrote:
> I think that is a good solution if it works, but look out: we need to
> synchronize across *all* domains, not just the default one. The sd.c
> code, for example, uses its own "scsi_sd_probe_domain" for example,
> and we *do* want to synchronize with it.
>
> Can you do that with your suggested interface (ie it would have to be
> a *global* sequence number).

So, I've been thinking about it for a while now and it looks like
async is cutting too many corners to implement any sane stackable
flushing scheme on top. There simply isn't much information to
determine who should wait for what.

I've thought of two workarounds. Both suck.

A. Try to detect deadlock conditions from synchronize(). If deadlock
condition involving other async jobs are detected, whine about it
and then skip. Ignore deadlock condition on self (should solve
this particular case).

Detecting deadlock condition isn't difficult if there are only
global synchronizations; unfortunately, fragmented dependencies via
domain-local synchronization makes this non-trivial.

We can still do ignore-self thing mostly trivially tho. This will
at least work around the problem at hand.

B. The ranged synchronization I first suggested. The problem with
this is that it's a common practice for a given async job to try to
flush anything which comes before it. This can introduce spurious
synchronization dependencies which can then lead to deadlocks.

These conditions can be detected and ignored, at least only
considering global synchronizations. The problem here is that
those deadlock conditions will occur under normal usage and thus
should be ignored silently, which basically makes synchronization
silently ignore and finish successfully even if there are
legitimate deadlocks which should be investigated.

For now, I'm gonna implement simple "I'm not gonna wait for myself"
self-deadlock avoidance. If this needs any more sophistication, I
think we better reimplement it so that we can explicitly match up and
track who's gonna wait for what instead of throwing everything into a
single cookie space and then try to work back from there.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/