Re: [PATCH v2 10/39] x86/mm: Introduce _PAGE_COW
From: Dave Hansen
Date: Mon Oct 03 2022 - 18:14:49 EST
On 10/3/22 14:36, Edgecombe, Rick P wrote:
>>> +static inline pte_t pte_clear_cow(pte_t pte)
>>> +{
>>> + /*
>>> + * _PAGE_COW is unnecessary on !X86_FEATURE_SHSTK kernels.
>>> + * See the _PAGE_COW definition for more details.
>>> + */
>>> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
>>> + return pte;
>>> +
>>> + /*
>>> + * PTE is getting copied-on-write, so it will be dirtied
>>> + * if writable, or made shadow stack if shadow stack and
>>> + * being copied on access. Set they dirty bit for both
>>> + * cases.
>>> + */
>>> + pte = pte_set_flags(pte, _PAGE_DIRTY);
>>> + return pte_clear_flags(pte, _PAGE_COW);
>>> +}
>> These X86_FEATURE_SHSTK checks make me uneasy. Maybe use the
>> _PAGE_COW
>> logic for all machines with 64-bit entries. It will get you much more
>> coverage and more universal rules.
> Yes, I didn't like them either at first. The reasoning originally was
> that _PAGE_COW is a bit more work and it might show up for some
> benchmark.
>
> Looking at this again though, it is just a few more operations on
> memory that is already getting touched either way. It must be a very
> tiny amount of impact if any. I'm fine removing them. Having just one
> set of logic around this would make it easier to reason about.
>
> Dave, any thoughts on this?
The cpu_feature_enabled(X86_FEATURE_SHSTK) checks enable both
compile-time and runtime optimization. What makes this even more fun is:
+#ifdef CONFIG_X86_SHADOW_STACK
+#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW)
+#else
+#define _PAGE_COW (_AT(pteval_t, 0))
+#endif
which I think means that the pte_clear_flags() goes away if
CONFIG_X86_SHADOW_STACK is disabled. So, what Rick posted here ends up
doing the following with:
| X86_FEATURE_SHSTK=1 | X86_FEATURE_SHSTK=0
==========+=====================+========================
CONFIG=n | compiled out | compiled out
CONFIG=y | set/clear | boot-time patched out
If we pull the cpu_feature_enabled() out, I think we end up getting
behavior like this:
| X86_FEATURE_SHSTK=1 | X86_FEATURE_SHSTK=0
==========+=====================+========================
CONFIG=n | set _PAGE_DIRTY | set _PAGE_DIRTY
CONFIG=y | set/clear | set/clear
It ends up adding instruction overhead (set _PAGE_DIRTY) to two cases
where it completely compiled out before. It also adds runtime overhead
(the "tiny amount of impact" you mentioned) to set/clear where it would
have runtime patched out before.
None of this is a deal breaker in terms of runtime overhead. But, I do
think the benefits of the cpu_feature_enabled() are worth it, even if
it's just an optimization. You could move it to the end of the series
and we can debate it on its own merits if you want.