Re: [RFC PATCH 0/7] x86/entry: Atomic statck switching for IST

From: Dave Hansen
Date: Mon Apr 03 2023 - 12:53:20 EST


On 4/3/23 07:05, Lai Jiangshan wrote:
> Documentation/x86/kernel-stacks.rst | 2 +
> arch/x86/entry/Makefile | 3 +
> arch/x86/entry/entry_64.S | 615 +++++++-------------------
> arch/x86/entry/ist_entry.c | 299 +++++++++++++
> arch/x86/include/asm/cpu_entry_area.h | 8 +-
> arch/x86/include/asm/idtentry.h | 15 +-
> arch/x86/include/asm/sev.h | 14 -
> arch/x86/include/asm/traps.h | 1 -
> arch/x86/kernel/asm-offsets_64.c | 7 +
> arch/x86/kernel/callthunks.c | 2 +
> arch/x86/kernel/dumpstack_64.c | 6 +-
> arch/x86/kernel/nmi.c | 8 -
> arch/x86/kernel/sev.c | 108 -----
> arch/x86/kernel/traps.c | 43 --
> arch/x86/kvm/vmx/vmx.c | 441 +++++++++++++++++-
> arch/x86/kvm/x86.c | 10 +-
> arch/x86/mm/cpu_entry_area.c | 2 +-
> tools/objtool/check.c | 7 +-
> 18 files changed, 937 insertions(+), 654 deletions(-)
> create mode 100644 arch/x86/entry/ist_entry.c

Some high-level comments...

The diffstat looks a lot nastier because of the testing patch. If you
that patch and trim the diffstat a bit, it starts to look a _lot_ more
appealing:

> arch/x86/entry/entry_64.S | 615 ++++++++----------------------------
> arch/x86/entry/ist_entry.c | 299 +++++++++++++++++
> arch/x86/kernel/sev.c | 108 ------
> arch/x86/kernel/traps.c | 43 --
...
> 16 files changed, 490 insertions(+), 650 deletions(-)

It removes more code than it adds and also removes a bunch of assembly.
If it were me posting this, I'd have shouted that from the rooftops
instead of obscuring it with a testing patch and leaving it as an
exercise to the reader to figure out.

It also comes with testing code and a great cover letter, which are rare
and _spectacular_.

That said, I'm torn. This series makes a valiant attempt to improve one
of the creakiest corners of arch/x86/. But, there are precious few
reviewers that can _really_ dig into this stuff. This is also a lot
less valuable now than it would have been when FRED was not on the horizon.

It's also not incremental at all. It's going to be a big pain when
someone bisects their random system hang to patch 4/7. It'll mean
there's a bug buried somewhere in the 500 lines of patch 3/7.

Grumbling aside, there's too much potential upside here to ignore. As
this moves out of RFC territory, it would be great if you could:

* Look at the FRED series and determine how much this collides
* Dig up some reviewers
* Flesh out the testing story. What kind of hardware was this tested
on? How much was bare metal vs. VMs, etc...?
* Reinforce what the benefits to end users are here. Are folks
_actually_ running into the #VC fragility, for instance?

Also, let's say we queue this, it starts getting linux-next testing, and
we start getting bug reports of hangs. It'll have to get reverted if we
can't find the bug fast.

How much of a pain would it be to make atomic-IST-entry _temporarily_
selectable at boot time? It would obviously need to keep the old code
around and would not have the nice diffstat. But that way, folks would
at least have a workaround while we're bug hunting.

1. https://lore.kernel.org/all/20230327075838.5403-1-xin3.li@xxxxxxxxx/