Re: [patch V6 10/37] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY
From: Andy Lutomirski
Date: Wed May 20 2020 - 13:23:19 EST
On Wed, May 20, 2020 at 8:16 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Wed, May 20, 2020 at 7:13 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > Andy Lutomirski <luto@xxxxxxxxxx> writes:
> > > On Tue, May 19, 2020 at 11:58 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > >> Which brings you into the situation that you call schedule() from the
> > >> point where we just moved it out. If we would go there we'd need to
> > >> ensure that RCU is watching as well. idtentry_exit() might have it
> > >> turned off ....
> > >
> > > I don't think this is possible. Once you untangle all the wrappers,
> > > the call sites are effectively:
> > >
> > > __this_cpu_write(xen_in_preemptible_hcall, true);
> > > CALL_NOSPEC to the hypercall page
> > > __this_cpu_write(xen_in_preemptible_hcall, false);
> > >
> > > I think IF=1 when this happens, but I won't swear to it. RCU had
> > > better be watching.
> > >
> > > As I understand it, the one and only situation Xen wants to handle is
> > > that an interrupt gets delivered during the hypercall. The hypervisor
> > > is too clever for its own good and deals with this by rewinding RIP to
> > > the beginning of whatever instruction did the hypercall and delivers
> > > the interrupt, and we end up in this handler. So, if this happens,
> > > the idea is to not only handle the interrupt but to schedule if
> > > scheduling would be useful.
> > >
> > > So I don't think we need all this RCU magic. This really ought to be
> > > able to be simplified to:
> > >
> > > idtentry_exit();
> > >
> > > if (appropriate condition)
> > > schedule();
> >
> > This is exactly the kind of tinkering which causes all kinds of trouble.
> >
> > idtentry_exit()
> >
> > if (user_mode(regs)) {
> > prepare_exit_to_usermode(regs);
> > } else if (regs->flags & X86_EFLAGS_IF) {
> > /* Check kernel preemption, if enabled */
> > if (IS_ENABLED(CONFIG_PREEMPTION)) {
> > ....
> > }
> > instrumentation_begin();
> > /* Tell the tracer that IRET will enable interrupts */
> > trace_hardirqs_on_prepare();
> > lockdep_hardirqs_on_prepare(CALLER_ADDR0);
> > instrumentation_end();
> > rcu_irq_exit();
> > lockdep_hardirqs_on(CALLER_ADDR0);
> > } else {
> > /* IRQ flags state is correct already. Just tell RCU */
> > rcu_irq_exit();
> > }
> >
> > So in case IF is set then this already told the tracer and lockdep that
> > interrupts are enabled. And contrary to the ugly version this exit path
> > does not use rcu_irq_exit_preempt() which is there to warn about crappy
> > RCU state when trying to schedule.
> >
> > So we went great length to sanitize _all_ of this and make it consistent
> > just to say: screw it for that xen thingy.
> >
> > The extra checks and extra warnings for scheduling come with the
> > guarantee to bitrot when idtentry_exit() or any logic invoked from there
> > is changed. It's going to look like this:
> >
> > /*
> > * If the below causes problems due to inconsistent state
> > * or out of sync sanity checks, please complain to
> > * luto@xxxxxxxxxx directly.
> > */
> > idtentry_exit();
> >
> > if (user_mode(regs) || !(regs->flags & X86_FlAGS_IF))
> > return;
> >
> > if (!__this_cpu_read(xen_in_preemptible_hcall))
> > return;
> >
> > rcu_sanity_check_for_preemption();
> >
> > if (need_resched()) {
> > instrumentation_begin();
> > xen_maybe_preempt_hcall();
> > trace_hardirqs_on();
> > instrumentation_end();
> > }
> >
> > Of course you need the extra rcu_sanity_check_for_preemption() function
> > just for this muck.
> >
> > That's a true win on all ends? I don't think so.
>
> Hmm, fair enough. I guess the IRQ tracing messes a bunch of this logic up.
>
> Let's keep your patch as is and consider cleanups later. One approach
> might be to make this work more like extable handling: instead of
> trying to schedule from inside the interrupt handler here, patch up
> RIP and perhaps some other registers and let the actual Xen code just
> do cond_resched(). IOW, try to make this work the way it always
> should have:
>
> int ret;
> do {
> ret = issue_the_hypercall();
> cond_resched();
> } while (ret == EAGAIN);
Andrew Cooper pointed out that there is too much magic in Xen for this
to work. So never mind.