RE: [PATCH] Drivers: hv: Allow single instance of hv_util devices

From: Michael Kelley
Date: Sun Dec 29 2024 - 13:02:47 EST


From: Sonia Sharma <sosha@xxxxxxxxxxxxxxxxxxx> Sent: Friday, December 20, 2024 3:56 PM
>

Please include the "linux-hyperv@xxxxxxxxxxxxxxx" mailing list
when submitting patches related to Hyper-V.

> Harden hv_util type device drivers to allow single
> instance of the device be configured at given time.
>

I think the reason for this patch needs more explanation. For several
VMBus devices, a well-behaved Hyper-V is expected to offer only one
instance of the device in a given VM. Linux guests originally assumed
that the Hyper-V host is well-behaved, so the device drivers for many
of these devices were written assuming only a single instance. But
with the introduction of Confidential Computing (CoCo) VMs, Hyper-V
is no longer assumed to be well-behaved. If a compromised & malicious
Hyper-V were to offer multiple instances of such a device, the device
driver assumption about a single instance would be false, and
memory corruption could occur, which has the potential to lead to
compromise of the CoCo VM. The intent is to prevent such a scenario.

Note that this problem extends beyond just "util" devices. Hyper-V
is expected to offer only a single instance of keyboard, mouse, frame
buffer, and balloon devices as well. So this patch should be extended
to include them as well (and your new function names containing
"hv_util" should be adjusted). Interestingly, the Hyper-V keyboard driver
does not assume a single instance, so it should be safe regardless. But
the mouse, frame buffer, and balloon drivers are not safe.

With this understanding, there are two ways to approach the problem:

1) Enforce the expectation that a well-behaved Hyper-V only offers a
single instance of these VMBus devices. That's the approach that this
patch takes.

2) Update the device drivers to remove the assumption of a single
instance. With this approach, if a compromised & malicious Hyper-V
were to offer multiple instances, the extra devices might be bogus,
but memory corruption would not occur and the integrity of the
CoCo VM should not be compromised. As mentioned above, such
is already the case with the keyboard driver.

I've thought about the tradeoffs for the two approaches, and don't
really have a strong opinion either way. In some sense, #2 is the
more correct approach as ideally device drivers shouldn't make
single instance assumptions. But #1 is an easier fix, and perhaps
more robust. Other reviewers might have other reasons to prefer
one over the other, and have a stronger viewpoint on the tradeoffs.
I would be interested in any such comments. But I'm OK with
approach #1 unless someone points out a good reason to
prefer #2.

>
> New function vmbus_is_valid_hvutil_offer() is added.
> It checks if the new offer is for hv_util device type,
> then read the refcount for that device and accept or
> reject the offer accordingly.
>
> Signed-off-by: Sonia Sharma <sonia.sharma@xxxxxxxxxxxxxxxxxxx>
> ---
> drivers/hv/channel_mgmt.c | 64 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3c6011a48dab..1a135cfad81f 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -20,6 +20,7 @@
> #include <linux/delay.h>
> #include <linux/cpu.h>
> #include <linux/hyperv.h>
> +#include <linux/refcount.h>
> #include <asm/mshyperv.h>
> #include <linux/sched/isolation.h>
>
> @@ -156,6 +157,8 @@ const struct vmbus_device vmbus_devs[] = {
> };
> EXPORT_SYMBOL_GPL(vmbus_devs);
>
> +static refcount_t singleton_vmbus_devs[HV_UNKNOWN + 1];
> +

>From what I can see, these refcounts always have a value of either
0 or 1. The refcount never goes above 1 because the intent of this
patch is to enforce having only a single instance. The patch reads
the refcount, sets it to 1, or decrements it. I think you could just
use an array of booleans here, and set them to true or false.
READ_ONCE() and WRITE_ONCE() should be used because the
booleans are accessed from multiple threads.

> static const struct {
> guid_t guid;
> } vmbus_unsupported_devs[] = {
> @@ -198,6 +201,25 @@ static bool is_unsupported_vmbus_devs(const guid_t *guid)
> return false;
> }
>
> +static bool is_dev_hv_util(u16 dev_type)
> +{
> + switch (dev_type) {
> + case HV_SHUTDOWN:
> + fallthrough;
> + case HV_TS:
> + fallthrough;
> + case HV_HB:
> + fallthrough;
> + case HV_KVP:
> + fallthrough;
> + case HV_BACKUP:
> + return true;
> +
> + default:
> + return false;
> + }
> +}
> +

Rather than have a big case statement here, we already have
the "vmbus_devs" array that statically specifies various properties
of each VMBus device type. I'd suggest adding a field to those array
entries to indicate whether the device type is expected to be a
singleton. Then you can just do a direct lookup, like with the
"perf_device" and "allowed_in_isolated" properties.

> static u16 hv_get_dev_type(const struct vmbus_channel *channel)
> {
> const guid_t *guid = &channel->offermsg.offer.if_type;
> @@ -1004,6 +1026,26 @@ find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer)
> return channel;
> }
>
> +static u16 vmbus_is_valid_hvutil_offer(const struct vmbus_channel_offer_channel *offer)
> +{
> + const guid_t *guid = &offer->offer.if_type;
> + u16 i;
> +
> + if (is_hvsock_offer(offer))
> + return HV_UNKNOWN;
> +
> + for (i = HV_IDE; i < HV_UNKNOWN; i++) {
> + if (guid_equal(guid, &vmbus_devs[i].guid) && is_dev_hv_util(i)) {

Ideally, we should avoid coding yet another case of searching through
the vmbus_devs[] array for a matching GUID. The function hv_get_dev_type()
already does this, and returns the index into the vmbus_devs[] array.
You could probably use that function, and then just pass the index as
the argument to this function.

That index is also stored as the "device_id" (arguably mis-named) in the
struct vmbus_channel, so it's already available in the rescind path.

> + if (refcount_read(&singleton_vmbus_devs[i]))
> + return HV_UNKNOWN;
> + refcount_set(&singleton_vmbus_devs[i], 1);
> + return i;
> + }
> + }
> +
> + return i;
> +}
> +
> static bool vmbus_is_valid_offer(const struct vmbus_channel_offer_channel *offer)
> {
> const guid_t *guid = &offer->offer.if_type;
> @@ -1031,6 +1073,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
> struct vmbus_channel_offer_channel *offer;
> struct vmbus_channel *oldchannel, *newchannel;
> size_t offer_sz;
> + u16 dev_type;
>
> offer = (struct vmbus_channel_offer_channel *)hdr;
>
> @@ -1115,11 +1158,29 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
> return;
> }
>
> + /*
> + * If vmbus_is_valid_offer() returns -
> + * HV_UNKNOWN - Subsequent offer received for hv_util dev, thus reject offer.
> + * HV_SHUTDOWN|HV_TS|HV_KVP|HV_HB|HV-KVP|HV_BACKUP - Increment refcount
> + * Others - Continue as is without doing anything.
> + *
> + * NOTE - We do not want to increase refcount if we resume from hibernation.
> + */
> + dev_type = vmbus_is_valid_hvutil_offer(offer);
> + if (dev_type == HV_UNKNOWN) {
> + pr_err_ratelimited("Invalid hv_util offer %d from the host supporting "
> + "isolation\n", offer->child_relid);

This check for multiple instances of a singleton device is not limited
to just CoCo VMs (a.k.a. "isolated VMs"). So the error message here really
shouldn't reference "host supporting isolation".

> + atomic_dec(&vmbus_connection.offer_in_progress);
> + return;
> + }
> +
> /* Allocate the channel object and save this offer. */
> newchannel = alloc_channel();
> if (!newchannel) {
> vmbus_release_relid(offer->child_relid);
> atomic_dec(&vmbus_connection.offer_in_progress);
> + if (is_dev_hv_util(dev_type))
> + refcount_dec(&singleton_vmbus_devs[dev_type]);

It might be good to have a function that combines the above two lines.
Then the two parallel functions are:

1) vmbus_is_valid_hvutil_offer() which marks a singleton device as
"already present" [and that function probably needs a new name]

2) vmbus_clear_singleton_device(), [or something similar] that clears
the boolean if it is a singleton device.

vmbus_clear_singleton_device() would also be used in the
rescind path and in the vmbus_add_channel_work() error path
that I mention below.

> pr_err("Unable to allocate channel object\n");
> return;
> }

There's another error case in the channel offer path that needs
to be handled. vmbus_add_channel_work() can fail, in which case
the new channel is cleaned up and removed. The accounting of
singleton devices must be updated if the channel is deleted via
this error path.

> @@ -1235,7 +1296,6 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
> /*
> * At this point, the rescind handling can proceed safely.
> */
> -

This is a spurious whitespace change that should be avoided.

> if (channel->device_obj) {
> if (channel->chn_rescind_callback) {
> channel->chn_rescind_callback(channel);
> @@ -1251,6 +1311,8 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
> */
> dev = get_device(&channel->device_obj->device);
> if (dev) {
> + if (is_dev_hv_util(hv_get_dev_type(channel)))

As noted above, the "dev_type" is already stored in the channel structure
as field "device_id" (which is a bit mis-named).

Michael

> + refcount_dec(&singleton_vmbus_devs[hv_get_dev_type(channel)]);
> vmbus_device_unregister(channel->device_obj);
> put_device(dev);
> }