Re: [DRIVER SUBMISSION] DRBD wants to go mainline

From: Lars Ellenberg
Date: Wed Jul 25 2007 - 05:46:53 EST


On Wed, Jul 25, 2007 at 04:41:53AM +0530, Satyam Sharma wrote:
> Hi Lars,
>
>
> On 7/24/07, Lars Ellenberg <lars.ellenberg@xxxxxxxxxx> wrote:
> >On Mon, Jul 23, 2007 at 07:10:58PM +0530, Satyam Sharma wrote:
> >> On 7/23/07, Lars Ellenberg <lars.ellenberg@xxxxxxxxxx> wrote:
> >> >On Sun, Jul 22, 2007 at 09:32:02PM -0400, Kyle Moffett wrote:
> >> >[...]
> >> >> Don't use signals between kernel threads, use proper primitives like
> >> >> notifiers and waitqueues, which means you should also probably switch
> >> >away
> >> >> from kernel_thread() to the kthread_*() APIs. Also you should fix
> >this
> >> >> FIXME or remove it if it no longer applies:-D.
> >> >
> >> >right.
> >> >but how to I tell a network thread in tcp_recvmsg to stop early,
> >> >without using signals?
> >>
> >> Yup, kthreads API cannot handle (properly stop) kernel threads that want
> >> to sleep on possibly-blocking-forever-till-signalled-functions such as
> >> tcp_recvmsg or skb_recv_datagram etc etc.
> >>
> >> There are two workarounds:
> >> 1. Use sk_rcvtimeo and related while-continue logic
> >> 2. force_sig(SIGKILL) to your kernel thread just before kthread_stop
> >> (note that you don't need to allow / write code to handle / etc signals
> >> in your kthread code -- force_sig will work automatically)
> >
> >this is not only at stop time.
> >for example our "drbd_asender" thread
> >does receive as well as send,
>
> That's normal -- in fact it would've been surprising if your kthread only
> did recvs but no sends!
>
> But where does the "send" come into the picture over here -- a send
> won't block forever, so I don't foresee any issues whatsoever w.r.t.
> kthreads conversion for that. [ BTW I hope you're *not* using any
> signals-based interface for your kernel thread _at all_. Kthreads
> disallow (ignore) all signals by default, as they should, and you really
> shouldn't need to write any logic to handle or do-certain-things-on-seeing
> a signal in a well designed kernel thread. ]
>
> >and the sending
> >latency is crucial to performance, while the recv
> >will not timeout for the next few seconds.
>
> Again, I don't see what sending latency has to do with a kernel_thread
> to kthread conversion. Or with signals, for that matter. Anyway, as
> Kyle Moffett mentioned elsewhere, you could probably look at other
> examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c).

the basic problem, and what we use signals for, is:

it is waiting in recv, waiting for the peer to say something.
but I want it to stop recv, and go send something "right now".
I don't want to have two threads for that.

yes we have timeo in place, anyways: we need to detect a failed peer
node in time. we even aim for "sub-second failover" sometimes (which is
not exactly feasible; but failover times of 15 seconds and less are
requirement for useable HA-iSCSI deployments).
but that does not cut it, timeo is seconds.
you don't want seconds latency for IO operations.

so I signal it, it breaks out of recv, then sends, and goes back to recv.

in-kernel epoll would probably solve this.
I don't know how to do that properly, though.

> >> only grow more and more complex). Also, removing this gunk from
> >> your driver will clearly make it smaller, and easier for us to review :-)
> >
> >and will poison the generic work queues
>
> You could create your own workqueue as Jens Axboe suggested.

will do.
I was not aware of the "create_singlethread_workqueue",
it does fit our useage good enough.

> Frankly, a high-level design document is a must, here (with lower
> level implementation details in the individual changelogs of the
> patches you finally post to this list). Working that out from 17 kloc
> would otherwise be too difficult for any reviewer.

sure. will be available soon.
this was just a "bust in and see what lkml does about it",
I don't expect to be merged within days :)
I think it is realistic to be merged this year, though...
[as chrismas present, maybe :-)]

--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/