RE: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
From: KY Srinivasan
Date: Sat Feb 07 2015 - 11:27:40 EST
> -----Original Message-----
> From: Dexuan Cui
> Sent: Friday, February 6, 2015 6:54 AM
> To: KY Srinivasan; Vitaly Kuznetsov
> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang; linux-
> kernel@xxxxxxxxxxxxxxx; Jason Wang
> Subject: RE: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
>
> > -----Original Message-----
> > From: KY Srinivasan
> > Sent: Friday, February 6, 2015 6:47 AM
> > To: Dexuan Cui; Vitaly Kuznetsov
> > Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang;
> > linux-kernel@xxxxxxxxxxxxxxx; Jason Wang
> > Subject: RE: [PATCH 0/4] Drivers: hv: Further protection for the
> > rescind path
> > > -----Original Message-----
> > > From: Dexuan Cui
> > > Sent: Thursday, February 5, 2015 4:45 AM
> > > To: Vitaly Kuznetsov; KY Srinivasan
> > > Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang; linux-
> > > kernel@xxxxxxxxxxxxxxx; Jason Wang
> > > Subject: RE: [PATCH 0/4] Drivers: hv: Further protection for the
> > > rescind path
> > >
> > > > -----Original Message-----
> > > > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> > > > Sent: Thursday, February 5, 2015 18:10 PM
> > > > To: KY Srinivasan
> > > > Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang;
> > > > linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang
> > > > Subject: Re: [PATCH 0/4] Drivers: hv: Further protection for the
> > > > rescind path
> > > >
> > > > KY Srinivasan <kys@xxxxxxxxxxxxx> writes:
> > > >
> > > > >> -----Original Message-----
> > > > >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> > > > >> Sent: Tuesday, February 3, 2015 9:01 AM
> > > > >> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx
> > > > >> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui;
> > > > >> Jason Wang
> > > > >> Subject: [PATCH 0/4] Drivers: hv: Further protection for the
> > > > >> rescind path
> > > > >>
> > > > >> This series is a continuation of the "Drivers: hv: vmbus:
> > > > >> serialize Offer and Rescind offer". I'm trying to address a
> > > > >> number of theoretically possible issues with rescind offer
> > > > >> handling. All these complications come from the fact that a
> > > > >> rescind offer results in vmbus channel being freed and we must
> > > > >> ensure nobody still uses it. Instead of introducing new locks I
> > > > >> suggest we switch channels usage
> > > to the get/put workflow.
> > > > >>
> > > > >> The main part of the series is [PATCH 1/4] which introduces the
> > > > >> workflow for vmbus channels, all other patches fix different
> > > > >> corner cases using this workflow. I'm not sure all such cases
> > > > >> are covered with this series (probably not), but in case
> > > > >> protection is required in some other places it should become
> relatively easy to add one.
> > > > >>
> > > > >> I did some sanity testing with CONFIG_DEBUG_LOCKDEP=y and
> > > > >> nothing popped out, however, additional testing would be much
> appreciated.
> > > > >>
> > > > >> K.Y., Haiyang, I'm not sending this series to netdev@ and
> > > > >> linux-scsi@ as it is supposed to be applied as a whole, please
> > > > >> resend these patches with your sign-offs when (and if) we're
> > > > >> done with
> > > reviews. Thanks!
> > > > >
> > > > > Vitaly,
> > > > >
> > > > > Thanks for looking into this issue. While today, rescind offer
> > > > > results in the
> > > > freeing of the channel, I don't think
> > > > > that is required. By not freeing up the channel in the rescind
> > > > > path, we can have
> > > > a safe way to access the channel and
> > > > > that does not have to involve taking a reference on the channel
> > > > > every time you
> > > > access it - the get/put workflow in your
> > > > > patch set. As part of the network performance improvement work,
> > > > > I had
> > > > eliminated all locks in the receive path by setting
> > > > > up per-cpu data structures for mapping the relid to channel etc.
> > > > > These set of
> > > > patches introduces locking/atomic operations
> > > > > in performance critical code paths to deal with an event that is
> > > > > truly rare - the channel getting rescinded.
> > > >
> > > > It is possible to eliminate all locks/atomic operations from
> > > > performance critical pyth in my patch series by following Dexuan's
> > > > suggestion - we'll get the channel in vmbus_open and put it in
> > > > vmbus_close (and on processing offer/rescind offer) this won't
> > > > affect performance. I'm in the middle of testing this approach.
> > > >
> > > > >
> > > > > All channel messages are handled in a single work context:
> > > > > vmbus_on_msg_dpc() -> vmbus_onmessage_work()-> Various
> channel
> > > > messages [offer, rescind etc.]
> > > This is true.
> > >
> > > > >
> > > > > So, the rescind message cannot be processed while we are
> > > > > processing the
> > > > offer message and since an offer
> > > > > cannot be rescinded before it is offered, offer and rescind are
> > > > > naturally serialized (I think I have patchset in my queue
> > > IMO this may not be true.
> > > The cause is:
> > > (I'm using the latest linux-next repo, which includes Vitaly's patch
> > > "Drivers: hv: vmbus: serialize Offer and Rescind offer".)
> > >
> > > vmbus_onoffer_rescind() runs in vmbus_connection.work_queue, but
> > > vmbus_process_offer() runs in the per-channel newchannel->controlwq,
> > > so the two functions are not serialized, at least in theory.
> > >
> > > As a result, in vmbus_process_offer(): after the new channel is
> > > added into vmbus_connection.chn_list, but before the channel is
> > > completely initialized by us (we need to create a vmbus device and
> > > associate the device with the channel -- this procedure could fail
> > > and we goto err_free_chan and free the channel directly!),
> > > vmbus_onoffer_rescind() can see the new channel, but doesn't know
> the channel could be freed in another place at the same time.
> > >
> > > BTW, when vmbus_process_offer() -> vmbus_device_create() fails, we
> > > goto err_free_chan without removing the new channel from
> > > vmbus_connection.chn_list?
> > >
> > > As another result : in vmbus_process_offer(), in the case
> > > vmbus_device_register() fails, we'll run
> > > "list_del(&newchannel->listentry) and unlock
> > > vmbus_connection.channel_lock" -- just after these 2 lines, at this
> > > time,
> > > vmbus_onoffer_rescind() -> relid2channel() can return NULL, and
> > > we'll miss the rescind message, i.e., we fail to send the
> > > CHANNELMSG_RELID_RELEASED message to the host.
> >
> > This is the way the code is; but it does not have to be. Here is a
> > patch that implements what I have been trying to describe. This is a
> > rough proposal; there is additional (obvious) cleanup that I have not
> > done.
> > This is on top of the patches that have been submitted to Greg.
> >
> > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
>
> Hi KY,
> The patch is great!
>
> It effectively removes the per-channel workqueue and uses the global
> vmbus_connection.work_queue to implement natural serialization.
> It simplifies all the things a lot!
>
> Two small suggestions:
>
> 1) hv_process_device_removal() -> hv_process_channel_removal().
>
> 2) I think we can revert the patch "Drivers: hv: vmbus: serialize Offer and
> Rescind offer"
> first and then this change will be smaller and clearer. :-)
Thanks Dexuan. I will submit a cleaned up version of this patch. Even if we cannot map
The relid to a channel, we can post the relid release message back to the host. I will make
this adjustment and send the patch out.
Thank you,
K. Y
>
> -- Dexuan
>
>
> > ---
> > drivers/hv/channel.c | 8 +++++
> > drivers/hv/channel_mgmt.c | 63 +++++++++++++-----------------------------
> --
> > drivers/hv/hv_util.c | 2 +-
> > drivers/hv/vmbus_drv.c | 24 ++++++++++++-----
> > include/linux/hyperv.h | 1 +
> > 5 files changed, 46 insertions(+), 52 deletions(-)
> >
> > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index
> > bf0cf8f..4a21cd8 100644
> > --- a/drivers/hv/channel.c
> > +++ b/drivers/hv/channel.c
> > @@ -501,6 +501,14 @@ static int vmbus_close_internal(struct
> > vmbus_channel
> > *channel)
> > put_cpu();
> > }
> >
> > +/*
> > + * If the channel has been rescinded; process device removal.
> > + */
> > +if (channel->rescind) {
> > +hv_process_device_removal(channel);
> > +return 0;
> > +}
> > +
> > /* Send a closing message */
> >
> > msg = &channel->close_msg.msg;
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 0ba6b5c..a208256 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -207,27 +207,11 @@ static void percpu_channel_deq(void *arg)
> > list_del(&channel->percpu_list); }
> >
> > -/*
> > - * vmbus_process_rescind_offer -
> > - * Rescind the offer by initiating a device removal
> > - */
> > -static void vmbus_process_rescind_offer(struct work_struct *work)
> > +void hv_process_device_removal(struct vmbus_channel *channel)
> > {
> > -struct vmbus_channel *channel = container_of(work,
> > - struct vmbus_channel,
> > - work);
> > +struct vmbus_channel_relid_released msg;
> > unsigned long flags;
> > struct vmbus_channel *primary_channel; -struct
> > vmbus_channel_relid_released msg; -struct device *dev;
> > -
> > -if (channel->device_obj) {
> > -dev = get_device(&channel->device_obj->device);
> > -if (dev) {
> > -vmbus_device_unregister(channel->device_obj);
> > -put_device(dev);
> > -}
> > -}
> >
> > memset(&msg, 0, sizeof(struct vmbus_channel_relid_released));
> > msg.child_relid = channel->offermsg.child_relid; @@ -256,6 +240,7 @@
> > static void vmbus_process_rescind_offer(struct
> > work_struct *work)
> > free_channel(channel);
> > }
> >
> > +
> > void vmbus_free_channels(void)
> > {
> > struct vmbus_channel *channel;
> > @@ -270,11 +255,8 @@ void vmbus_free_channels(void)
> > * vmbus_process_offer - Process the offer by creating a channel/device
> > * associated with this offer
> > */
> > -static void vmbus_process_offer(struct work_struct *work)
> > +static void vmbus_process_offer(struct vmbus_channel *newchannel)
> > {
> > -struct vmbus_channel *newchannel = container_of(work, -struct
> > vmbus_channel, -work); struct vmbus_channel *channel; bool fnew =
> > true; bool enq = false; @@ -340,7 +322,7 @@ static void
> > vmbus_process_offer(struct work_struct
> > *work)
> > if (channel->sc_creation_callback != NULL)
> > channel->sc_creation_callback(newchannel);
> >
> > -goto done_init_rescind;
> > +goto done;
> > }
> >
> > goto err_free_chan;
> > @@ -381,15 +363,10 @@ static void vmbus_process_offer(struct
> > work_struct
> > *work)
> > kfree(newchannel->device_obj);
> > goto err_free_chan;
> > }
> > -done_init_rescind:
> > -spin_lock_irqsave(&newchannel->lock, flags);
> > -/* The next possible work is rescind handling */
> > -INIT_WORK(&newchannel->work, vmbus_process_rescind_offer);
> > -/* Check if rescind offer was already received */ -if
> > (newchannel->rescind) -queue_work(newchannel->controlwq,
> > &newchannel->work); -spin_unlock_irqrestore(&newchannel->lock, flags);
> > +
> > +done:
> > return;
> > +
> > err_free_chan:
> > free_channel(newchannel);
> > }
> > @@ -515,8 +492,7 @@ static void vmbus_onoffer(struct
> > vmbus_channel_message_header *hdr) newchannel->monitor_grp =
> > (u8)offer->monitorid / 32; newchannel->monitor_bit =
> > (u8)offer->monitorid % 32;
> >
> > -INIT_WORK(&newchannel->work, vmbus_process_offer);
> > -queue_work(newchannel->controlwq, &newchannel->work);
> > +vmbus_process_offer(newchannel);
> > }
> >
> > /*
> > @@ -529,6 +505,7 @@ static void vmbus_onoffer_rescind(struct
> > vmbus_channel_message_header *hdr) struct
> vmbus_channel_rescind_offer
> > *rescind; struct vmbus_channel *channel; unsigned long flags;
> > +struct device *dev;
> >
> > rescind = (struct vmbus_channel_rescind_offer *)hdr; channel =
> > relid2channel(rescind->child_relid);
> > @@ -539,18 +516,16 @@ static void vmbus_onoffer_rescind(struct
> > vmbus_channel_message_header *hdr)
> >
> > spin_lock_irqsave(&channel->lock, flags); channel->rescind = true;
> > -/*
> > - * channel->work.func != vmbus_process_rescind_offer means we are
> > still
> > - * processing offer request and the rescind offer processing should
> > be
> > - * postponed. It will be done at the very end of
> > vmbus_process_offer()
> > - * as rescind flag is being checked there.
> > - */
> > -if (channel->work.func == vmbus_process_rescind_offer)
> > -/* work is initialized for vmbus_process_rescind_offer() from
> > - * vmbus_process_offer() where the channel got created */
> > -queue_work(channel->controlwq, &channel->work);
> > -
> > spin_unlock_irqrestore(&channel->lock, flags);
> > +
> > +if (channel->device_obj) {
> > +dev = get_device(&channel->device_obj->device);
> > +if (dev) {
> > +vmbus_device_unregister(channel->device_obj);
> > +put_device(dev);
> > +}
> > +}
> > +
> > }
> >
> > /*
> > diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c index
> > c5be773..7994ec2 100644
> > --- a/drivers/hv/hv_util.c
> > +++ b/drivers/hv/hv_util.c
> > @@ -380,9 +380,9 @@ static int util_remove(struct hv_device *dev) {
> > struct hv_util_service *srv = hv_get_drvdata(dev);
> >
> > -vmbus_close(dev->channel);
> > if (srv->util_deinit)
> > srv->util_deinit();
> > +vmbus_close(dev->channel);
> > kfree(srv->recv_buffer);
> >
> > return 0;
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index
> > 3cd44ae..64627c4 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -505,15 +505,25 @@ static int vmbus_probe(struct device
> *child_device)
> > */
> > static int vmbus_remove(struct device *child_device) { -struct
> > hv_driver *drv = drv_to_hv_drv(child_device->driver);
> > +struct hv_driver *drv;
> > struct hv_device *dev = device_to_hv_device(child_device);
> >
> > -if (drv->remove)
> > -drv->remove(dev);
> > -else
> > -pr_err("remove not set for driver %s\n", -dev_name(child_device));
> > -
> > +if (child_device->driver) {
> > +drv = drv_to_hv_drv(child_device->driver);
> > +if (drv->remove) {
> > +drv->remove(dev);
> > +} else {
> > +pr_err("remove not set for driver %s\n", dev_name(child_device));
> > +hv_process_device_removal(dev->channel);
> > +}
> > +} else {
> > +/*
> > + * We don't have a driver for this device; deal with the
> > + * rescind message.
> > + */
> > +hv_process_device_removal(dev->channel);
> > +}
> > return 0;
> > }
> >
> > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index
> > b079abf..9658bfe 100644
> > --- a/include/linux/hyperv.h
> > +++ b/include/linux/hyperv.h
> > @@ -1226,6 +1226,7 @@ void hv_kvp_onchannelcallback(void *); int
> > hv_vss_init(struct hv_util_service *); void hv_vss_deinit(void);
> > void hv_vss_onchannelcallback(void *);
> > +void hv_process_device_removal(struct vmbus_channel *channel);
> >
> > extern struct resource *hyperv_mmio;
> >
> > --
> > 1.7.4.1
> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/