RE: [PATCH 6/6] PCI: hv: Fix synchronization between channel callback and hv_compose_msi_msg()
From: Michael Kelley (LINUX)
Date: Fri Apr 08 2022 - 11:29:31 EST
From: Andrea Parri (Microsoft) <parri.andrea@xxxxxxxxx> Sent: Wednesday, April 6, 2022 9:30 PM
>
> Dexuan wrote:
>
> "[...] when we disable AccelNet, the host PCI VSP driver sends a
> PCI_EJECT message first, and the channel callback may set
> hpdev->state to hv_pcichild_ejecting on a different CPU. This can
> cause hv_compose_msi_msg() to exit from the loop and 'return', and
> the on-stack variable 'ctxt' is invalid. Now, if the response
> message from the host arrives, the channel callback will try to
> access the invalid 'ctxt' variable, and this may cause a crash."
>
> Schematically:
>
> Hyper-V sends PCI_EJECT msg
> hv_pci_onchannelcallback()
> state = hv_pcichild_ejecting
> hv_compose_msi_msg()
> alloc and init comp_pkt
> state == hv_pcichild_ejecting
> Hyper-V sends VM_PKT_COMP msg
> hv_pci_onchannelcallback()
> retrieve address of comp_pkt
> 'free' comp_pkt and return
> comp_pkt->completion_func()
>
> Dexuan also showed how the crash can be triggered after introducing
> suitable delays in the driver code, thus validating the 'assumption'
> that the host can still normally respond to the guest's compose_msi
> request after the host has started to eject the PCI device.
>
> Fix the synchronization by leveraging the requestor lock as follows:
>
> - Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while
> holding the requestor lock) associated to the completion packet.
>
> - Retrieve the address *and call ->completion_func() within a same
> (requestor) critical section in hv_pci_onchannelcallback().
>
> Fixes: de0aa7b2f97d3 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
> Reported-by: Wei Hu <weh@xxxxxxxxxxxxx>
> Reported-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> Suggested-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@xxxxxxxxx>
> ---
> The "Fixes:" tag is mainly a reference: a back-port would depend
> on the entire series (which, in turn, shouldn't be backported to
> commits preceding bf5fd8cae3c8f).
>
> drivers/pci/controller/pci-hyperv.c | 33 +++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index c1322ac37cda9..f1d794f8a5ef1 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1695,7 +1695,7 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> struct pci_create_interrupt3 v3;
> } int_pkts;
> } __packed ctxt;
> -
> + u64 trans_id;
> u32 size;
> int ret;
>
> @@ -1757,10 +1757,10 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> goto free_int_desc;
> }
>
> - ret = vmbus_sendpacket(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
> - size, (unsigned long)&ctxt.pci_pkt,
> - VM_PKT_DATA_INBAND,
> - VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
> + ret = vmbus_sendpacket_getid(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
> + size, (unsigned long)&ctxt.pci_pkt,
> + &trans_id, VM_PKT_DATA_INBAND,
> +
> VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
> if (ret) {
> dev_err(&hbus->hdev->device,
> "Sending request for interrupt failed: 0x%x",
> @@ -1839,6 +1839,15 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
>
> enable_tasklet:
> tasklet_enable(&channel->callback_event);
> + /*
> + * The completion packet on the stack becomes invalid after 'return';
> + * remove the ID from the VMbus requestor if the identifier is still
> + * mapped to/associated with the packet. (The identifier could have
> + * been 're-used', i.e., already removed and (re-)mapped.)
> + *
> + * Cf. hv_pci_onchannelcallback().
> + */
> + vmbus_request_addr_match(channel, trans_id, (unsigned long)&ctxt.pci_pkt);
> free_int_desc:
> kfree(int_desc);
> drop_reference:
> @@ -2717,6 +2726,7 @@ static void hv_pci_onchannelcallback(void *context)
> struct pci_dev_inval_block *inval;
> struct pci_dev_incoming *dev_message;
> struct hv_pci_dev *hpdev;
> + unsigned long flags;
>
> buffer = kmalloc(bufferlen, GFP_ATOMIC);
> if (!buffer)
> @@ -2751,8 +2761,11 @@ static void hv_pci_onchannelcallback(void *context)
> switch (desc->type) {
> case VM_PKT_COMP:
>
> - req_addr = chan->request_addr_callback(chan, req_id);
> + lock_requestor(chan, flags);
> + req_addr = __vmbus_request_addr_match(chan, req_id,
> + VMBUS_RQST_ADDR_ANY);
> if (req_addr == VMBUS_RQST_ERROR) {
> + unlock_requestor(chan, flags);
> dev_warn_ratelimited(&hbus->hdev->device,
> "Invalid transaction ID %llx\n",
> req_id);
> @@ -2760,9 +2773,17 @@ static void hv_pci_onchannelcallback(void *context)
> }
> comp_packet = (struct pci_packet *)req_addr;
> response = (struct pci_response *)buffer;
> + /*
> + * Call ->completion_func() within the critical section to make
> + * sure that the packet pointer is still valid during the call:
> + * here 'valid' means that there's a task still waiting for the
> + * completion, and that the packet data is still on the waiting
> + * task's stack. Cf. hv_compose_msi_msg().
> + */
> comp_packet->completion_func(comp_packet->compl_ctxt,
> response,
> bytes_recvd);
> + unlock_requestor(chan, flags);
> break;
>
> case VM_PKT_DATA_INBAND:
> --
> 2.25.1
Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>