From: Roman Kisel <romank@xxxxxxxxxxxxxxxxxxx> Sent: Friday, December 20, 2024 11:14 AM
Thank you, Michael, for helping me explore the options! From where I
I will not object to your preferred path of allocating the
hyperv_pcpu_output_arg when CONFIG_HYPERV_VTL_MODE is set. For
me, it has been a worthwhile discussion to explore the alternatives. I do,
however, want to get further clarification about the spec requirements
and whether our current Linux code is meeting those requirements. More
on that below. And allow me to continue the discussion about a couple
of points you raised.
I see three uses in current upstream code. And then there are close toWhile compiling my arguments and making sure they're worth the readers'
a dozen more uses in Nuno's year-old patch set for /dev/mshv that hasn't
been accepted upstream yet. Does that roughly match what you are referring
to?
To me, this has been the ever-shining light in the dark maze of other* dom0 and the VTL mode can share code quite naturally,
Yep.
I meant that when implementing a generic case, not using the output page* Not using the output page cannot acccount for all cases permitted by
the TLFS clause "don't overlap input and output, and don't cross page
boundaries",
This is what I don’t understand.
Your approach appears being solid to me! Again, going to be beating on* Not using the output page everywhere for consistency requires updating
call sites of `hv_do_hypercall` and friends, i.e. every place where a
hypercall is made needs to incorporate the offset/pointer at which the
output should start, or perhaps do some shenanigans with macro's,
In my musing about getting rid of hyperv_pcpu_output_arg entirely, the
call signature of hv_do_hypercall() and friends would remain unchanged.
There would still be a virtual address passed as the output argument (which
might be NULL, and is *not* required to be page aligned). The caller of
hv_do_hypercall() must only ensure that the virtual address points to an
output area that is large enough to contain the output without crossing a
page boundary. The area must also not overlap with the input area. Given
that we currently have only 3 uses of hyperv_pcpu_output_arg in upstream
code, only those would need to be tweaked to split the hyperv_pcpu_input_arg
page into an input and output area. All other callers of hv_do_hypercall()
would be unchanged.
Agreed!* Not using the output page leads to negligible savings,
it is hard to see for me how to make not using the hypercall output page
look as a profitable enginnering endeavor, really comes off as dancing
to the perf scare/feature creep tune for peanuts.
There would be some simplification in hv_common_free(), hv_common_init(),
and hv_common_cpu_init() if there was no hyperv_pcpu_output_arg. :-)
In my opinion, we should use the hypercall output page for the VTL mode
as dom0 does to share code and reduce code churn. Had we used some
`hv_get_registers` in both instead of the specialized for no compelling
imo practical reason `get_vtl` (as it just gets a vCPU register, nothing
more, nothing less), this code path would've been better tested, and any
of this patching would not have been needed.
FWIW, the historical context is that at the time get_vtl() was implemented,
all VP registers that Linux needed to access were available as synthetic
MSRs on the x86 side. Going back to its origins 15 years ago, the Linux code
for Hyper-V always just used the synthetic MSRs. get_vtl() was the first time
that x86 code needed the HVCALL_GET_VP_REGISTERS hypercall because
HV_REGISTER_VSM_VP_STATUS has no synthetic MSR equivalent. There still
is no 'hv_get_registers' function in upstream code on the x86 side.
hv_get_vpreg() exists only on the arm64 side where there are no synthetic
MSRs or equivalent.
Michael, thank you very much for your efforts! At that time, no one
In hindsight, hv_get_vpreg() could have been added on the x86 side, and
then get_vtl() implemented on top of that. And I made the get_vtl() situation
worse by giving Tianyu bad advice about overlapping the output area with
the input area. :-(
I once had a manager at Microsoft who said "He reserved the right
to wake up smarter every day." I've had to claim that right as well
from time-to-time. :-)
Appreciate your advice! I don't think they do. The code doesn't overlap input and output, and to my eye, the arguments are sized and aligned appropriately. That does not violate TLFS. I'd think the
I'd wait for few days and then would likely prefer to run with Wei's
permission to send this patch in v2 as-is unless some damning evidence
presents itself.
Again, I won't object to your taking that path.
But regarding compliance with the TLFS, take a look at the code for
hv_pci_read_mmio() and hv_pci_write_mmio(). Do you think that code
violates the spec? If not, what would a violation look like in the context
of this discussion? If current code is in violation, then we should fix that.
Michael