I saw no considerations for the recommendations I had made last on your v1:
https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@xxxxxxxxxxxxxx
Of importance:
1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
is for legacy x86:
Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
this is wrong. It will be renamed to x86_legacy_free() to align with what folks
are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
This also means re-thinking all use cases and ensuring subarch is used then
instead when the goal was to avoid Xen from entering that code. Today Xen does
not use this but with my work it does and it helps clean and brush up a lot of
these checks with future prospects to even help unify entry points.
2) We should avoid more hypervisor type hacks, and just consider a new
hypervisor type to close the gap:
Using x86_legacy_free() and friends in a unified way for all systems means it
should only be used after init_hypervisor_platform() which is called during
setup_arch(). This means we have a semantic gap for checks on "are we on
hypervisor type and which one?".
There are drivers now using these sorts of
checks as well, for instance snd_intel8x0_inside_vm(). We should avoid having
these hacks but that also means cleaning up a well define grammar here for what
we want. I'm doing work to help with this by streamlining use of the subarch
type, that should help with PV code, but your use case seems different but yet
related, what I had suggested last was to consider we add a new hypervisor type
to the x86 boot protocol which would be available early on. This would have a
few purposes, one of which deserves its own section below on dead code:
a) clean up hacks as with snd_intel8x0_inside_vm()
b) enable a generic way and clean way to distinguish what hypervisor
type you're on
c) since it would be set early and if we can ensure its accessible
early on boot it would mean avoiding having to add yet-another
asm entry point for Linux, you could just use startup_32() and
the hypervisor type could easily just have an early branch call
and post branch call very similar to how we deal with the subarch
currently on 32-bit. Your calls then just become early stubs and
we'd have a solution for other PV types that want a similar solution
later
3) Dead code concerns and unifying entry points:
Addressing the semantics for the gray areas I am highlighting are critical for
ensuring one does not run code or even exposes code as a available for the type
of run time system booted, some folks call this "dead code". This is critical
for Linux distributions which need to rely on the flexibility of having one
kernel work for different use cases. The resolution to this problem was pvops
but pvops has shortcoming for dead code, it didn't address the problem likely as
it was not considered serious. It also didn't address the issue of different
hypervisors wanting different entry points and that this fact alone also contributes
to more dead code concerns, case in point the regressions introduced by cr4 shadow
and the latest one is Kasan which to this day breaks Xen! Dead code topics are
not easy to grasp, its why I've started on my own crusade to talk to people and
write about it [0], and as of late propose some changes to avoid these in a
clean way without extending pvops. Adding yet another entry point will not help
here *specially* if we do not take semantics seriously over the different hypervisors
and hypervisor types.
[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
My recommendation then:
We add new hypervisor type to close the semantic gap for hypervisor types, and
much like subarch enable also a subarch_data to let you pass and use your
hvmlite_start_info. This would not only help with the semantics but also help
avoid yet-another-entry point and force us to provide a well define structure
for considering code that should not run by pegging it as required or supported
for different early x86 code stubs.
I had hinted perhaps we might be able to piggy back on top of the ELF loader
protocol as well, and since that's standard do wonder if that could instead
be extended to help unify a mechanism for different OSes instead of making
this just a solution for Linux.
Code review below.
On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercallThe section annotations seems very special use case but likely worth documenting
page, initialize boot_params, enable early page tables.
Since this stub is executed before kernel entry point we cannot use
variables in .bss which is cleared by kernel. We explicitly place
variables that are initialized here into .data.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
---
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5774800..5f05fa2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -171,6 +172,17 @@ struct tls_descs {
*/
static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * HVMlite variables. These need to live in data segment since they are
+ * initialized before startup_{32|64}, which clear .bss, are invoked.
+ */
+int xen_hvmlite __attribute__((section(".data"))) = 0;
+struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
+uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
+struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
+#endif
+
and defining a new macro for in include/linux/compiler.h. This would make it
easier to change should we want to change the section used here later and
enable others to easily look for the reason for these annotations in a
single place.