Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode

From: Thomas Gleixner
Date: Tue Sep 01 2020 - 16:02:17 EST


Joel,

On Tue, Sep 01 2020 at 12:50, Joel Fernandes wrote:
> On Tue, Sep 01, 2020 at 05:54:53PM +0200, Thomas Gleixner wrote:
>> On Fri, Aug 28 2020 at 15:51, Julien Desfossez wrote:
>> > /*
>> > - * Disable interrupts and reevaluate the work flags as they
>> > - * might have changed while interrupts and preemption was
>> > - * enabled above.
>> > + * Reevaluate the work flags as they might have changed
>> > + * while interrupts and preemption were enabled.
>>
>> What enables preemption and interrupts? Can you pretty please write
>> comments which explain what's going on.
>
> Yes, sorry. So, sched_core_unsafe_exit_wait() will spin with IRQs enabled and
> preemption disabled. I did it this way to get past stop_machine() issues
> where you were unhappy with us spinning in IRQ disabled region.

So the comment is even more nonsensical :)

>> > - if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
>> > - ti_work = exit_to_user_mode_loop(regs, ti_work);
>> > + ti_work = exit_to_user_mode_loop(regs);
>>
>> Why has the above loop to be invoked unconditionally even when that core
>> scheduling muck is disabled? Just to make all return to user paths more
>> expensive unconditionally, right?
>
> If you see the above loop, we are calling exit_to_user_mode_work()
> conditionally by calling exit_to_user_mode_work_pending() which does the same
> thing.

It does the same thing technically, but the fastpath, i.e. no work to
do, is not longer straight forward. Just look at the resulting ASM
before and after. It's awful.

> So we are still conditionally doing the usual exit_to_user_mode work, its
> just that now we have to unconditionally invoke the 'loop' anyway.

No.

> The reason for that is, the loop can switch into another thread, so we
> have to do unsafe_exit() for the old thread, and unsafe_enter() for
> the new one while handling the tif work properly. We could get
> migrated to another CPU in this loop itself, AFAICS. So the
> unsafe_enter() / unsafe_exit() calls could also happen on different
> CPUs.

That's fine. It still does not justify to make everything slower even if
that 'pretend that HT is secure' thing is disabled.

Something like the below should be sufficient to do what you want
while restricting the wreckage to the 'pretend ht is secure' case.

The generated code for the CONFIG_PRETENT_HT_SECURE=n case is the same
as without the patch. With CONFIG_PRETENT_HT_SECURE=y the impact is
exactly two NOP-ed out jumps if the muck is not enabled on the command
line which should be the default behaviour.

Thanks,

tglx

---
--- /dev/null
+++ b/include/linux/pretend_ht_secure.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_PRETEND_HT_SECURE_H
+#define _LINUX_PRETEND_HT_SECURE_H
+
+#ifdef CONFIG_PRETEND_HT_SECURE
+static inline void enter_from_user_ht_sucks(void)
+{
+ if (static_branch_unlikely(&pretend_ht_secure_key))
+ enter_from_user_pretend_ht_is_secure();
+}
+
+static inline void exit_to_user_ht_sucks(void)
+{
+ if (static_branch_unlikely(&pretend_ht_secure_key))
+ exit_to_user_pretend_ht_is_secure();
+}
+#else
+static inline void enter_from_user_ht_sucks(void) { }
+static inline void exit_to_user_ht_sucks(void) { }
+#endif
+
+#endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -17,6 +17,7 @@
* 1) Tell lockdep that interrupts are disabled
* 2) Invoke context tracking if enabled to reactivate RCU
* 3) Trace interrupts off state
+ * 4) Pretend that HT is secure
*/
static __always_inline void enter_from_user_mode(struct pt_regs *regs)
{
@@ -28,6 +29,7 @@ static __always_inline void enter_from_u

instrumentation_begin();
trace_hardirqs_off_finish();
+ enter_from_user_ht_sucks();
instrumentation_end();
}

@@ -111,6 +113,12 @@ static __always_inline void exit_to_user
/* Workaround to allow gradual conversion of architecture code */
void __weak arch_do_signal(struct pt_regs *regs) { }

+static inline unsigned long exit_to_user_get_work(void)
+{
+ exit_to_user_ht_sucks();
+ return READ_ONCE(current_thread_info()->flags);
+}
+
static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
unsigned long ti_work)
{
@@ -149,7 +157,7 @@ static unsigned long exit_to_user_mode_l
* enabled above.
*/
local_irq_disable_exit_to_user();
- ti_work = READ_ONCE(current_thread_info()->flags);
+ ti_work = exit_to_user_get_work();
}

/* Return the latest work state for arch_exit_to_user_mode() */
@@ -158,7 +166,7 @@ static unsigned long exit_to_user_mode_l

static void exit_to_user_mode_prepare(struct pt_regs *regs)
{
- unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+ unsigned long ti_work = exit_to_user_get_work();

lockdep_assert_irqs_disabled();