Re: [PATCH] xen: detect uninitialized xenbus in xenbus_init

From: Juergen Gross
Date: Fri Nov 19 2021 - 00:16:34 EST


On 18.11.21 22:00, Stefano Stabellini wrote:
On Thu, 18 Nov 2021, Juergen Gross wrote:
On 18.11.21 09:47, Jan Beulich wrote:
On 18.11.2021 06:32, Juergen Gross wrote:
On 18.11.21 03:37, Stefano Stabellini wrote:
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -951,6 +951,28 @@ static int __init xenbus_init(void)
err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
if (err)
goto out_error;
+ /*
+ * Return error on an invalid value.
+ *
+ * Uninitialized hvm_params are zero and return no error.
+ * Although it is theoretically possible to have
+ * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it
is
+ * not zero when valid. If zero, it means that Xenstore hasn't
+ * been properly initialized. Instead of attempting to map a
+ * wrong guest physical address return error.
+ */
+ if (v == 0) {

Make this "if (v == ULONG_MAX || v== 0)" instead?
This would result in the same err on a new and an old hypervisor
(assuming we switch the hypervisor to init params with ~0UL).

Sure, I can do that


+ err = -ENOENT;
+ goto out_error;
+ }
+ /*
+ * ULONG_MAX is invalid on 64-bit because is INVALID_PFN.
+ * On 32-bit return error to avoid truncation.
+ */
+ if (v >= ULONG_MAX) {
+ err = -EINVAL;
+ goto out_error;
+ }

Does it make sense to continue the system running in case of
truncation? This would be a 32-bit guest with more than 16TB of RAM
and the Xen tools decided to place the Xenstore ring page above the
16TB boundary. This is a completely insane scenario IMO.

A proper panic() in this case would make diagnosis of that much
easier (me having doubts that this will ever be hit, though).

While I agree panic() may be an option here (albeit I'm not sure why
that would be better than trying to cope with 0 and hence without

I could imagine someone wanting to run a guest without Xenstore access,
which BTW will happen in case of a guest created by the hypervisor at
boot time.

xenbus), I'd like to point out that the amount of RAM assigned to a
guest is unrelated to the choice of GFNs for the various "magic"
items.

Yes, but this would still be a major tools problem which probably
would render the whole guest rather unusable.

First let's distinguish between an error due to "hvm_param not
initialized" and an error due to more serious conditions, such as "pfn
above max".

"hvm_param not initialized" could mean v == 0 (as it would be today) or
v == ~0UL (if we change Xen to initialize all hvm_param to ~0UL). I
don't think we want to panic in these cases as they are not actually
true erroneous configurations. We should just stop trying to initialize
xenstore and continue with the rest.


The "pfn above max" case could happen if v is greater than the max pfn.
This is a true error in the configuration because the toolstack should
know that the guest is 32-bit so it should give it a pfn that the guest

I don't think so. All x86 PVH/HVM guests start booting in 32-bit mode.

is able to use. As Jan wrote in another email, for 32-bit the actual
limit depends on the physical address bits but actually Linux has never
been able to cope with a pfn > ULONG_MAX on 32-bit because xen_store_gfn
is defined as unsigned long. So Linux 32-bit has been truncating
HVM_PARAM_STORE_PFN all along.

The question is whether the number of physical address bits as presented
to the guest is always >= 44. If not the actual limit is less than
ULONG_MAX. Other than that you are right: a PFN larger than a 32-bit
ULONG_MAX will be truncated by a 32-bit guest.

There is also an argument that depending on kconfig Linux 32-bit might
only be able to handle addresses < 4G, so I don't think the toolstack
can assume that a 32-bit guest is able to cope with HVM_PARAM_STORE_PFN
ULONG_MAX. If Linux is 32-bit and HVM_PARAM_STORE_PFN > ULONG_MAX,
even if HVM_PARAM_STORE_PFN < address_bits_max I think it would be fair
to still consider it an error, but I can see it could be argued either
way. Certainly if HVM_PARAM_STORE_PFN > address_bits_max is an error.

Right. The tools should NEVER put the frame above 4G for a non-PV guest.

In any case, I think it is still better for Linux to stop trying to
initialize Xenstore but continue with the rest because there is a bunch
of other useful things Linux can do without it. Panic should only be the
last resort if there is nothing else to do. In this case we haven't even
initialized the service and the service is not essential, at least it is
not essential in certain ARM setups.


So in conclusion, I think this patch should:
- if v == 0 return error (uninitialized)
- if v == ~0ULL (INVALID_PFN) return error (uinitialized)
- if v >= ~0UL (32-bit) return error (even if this case could be made to
work for v < max_address_bits depending on kconfig)

Which leads to something like:

/* uninitialized */
if (v == 0 || v == ~0ULL) {
err = -ENOENT;
goto out_error;
}
/*
* Avoid truncation on 32-bit.
* TODO: handle addresses >= 4G
*/
if ( v >= ~0UL ) {
err = -EINVAL;
goto out_error;

I think at least in this case a pr_err("...") should be added.

Silent failure is not nice.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature