Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang

From: Pavel Machek
Date: Wed Oct 07 2015 - 04:19:50 EST


On Mon 2015-09-21 17:33:21, Sheng Yang wrote:
> Thank you Paul! That's exactly the issue I met. I've read the whole
> thread and got a general idea of the issue.
>
> I try to summarize it and please correct me if I'm wrong:
>
> 1. The issue is the result of kill_bdev() when connection has been cut
> when IO is still flying.
> 2. Other block devices driver didn't have this issue because they
> normally keep the buffer and device, deny all the requests and only
> kill the device when all request has been settled down and device has
> been umounted.
> 3. Why NBD cannot handle it in the same way because NBD has dependency
> on userspace nfs-client, which would handle the reconnection/retry. If
> DO_IT ioctl didn't return, then there is no way for userspace to
> reconnect. How to coordinate between kernel and userspace on
> reconnecting while leaving the device open until it's safe to close
> leads to several choices and would leads to quite amount of change.
>
> Sounds like Goswin's suggestion really make sense(
> http://sourceforge.net/p/nbd/mailman/message/31678340/ ). This issue
> has been there for years and impact the stability of NBD a lot. I
> think we should get it fixed.

You can get fix this one, but getting memory management right so that
client & server on same machine is safe is not going to be easy.

Think "nbd-server swapped out, because memory was needed by dirty buffers".

Think "nbd-server can't allocate memory because it is all in the dirty buffers".

Pavel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/