Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe
From: Valentin Schneider
Date: Mon Jun 27 2022 - 08:42:37 EST
On 25/06/22 12:04, Eric W. Biederman wrote:
> I am not particularly fond of this patch as it adds more complexity than
> is necessary to solve the problem.
>
> Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
> broken as it can not support the use cases of an ordinary mutex_trylock.
> I have not seen (possibly I skimmed too quickly) anywhere in the
> discussion why PREEMPT_RT is not being fixed. Looking at the code
> there is enough going on in try_to_take_rt_mutex that I can imagine
> that some part of that code is not nmi safe. So I can believe
> PREEMPT_RT may be unfix-ably broken.
>
AFAICT same goes for !PREEMPT_RT given the mutex_unlock(); it's a bit
convoluted but you can craft scenarios where the NMI ends up spinning on
mutex->wait_lock that is owned by the interrupted task, e.g.
CPU0 CPU1
crash_shrink_memory()
mutex_lock();
crash_get_memory_size()
mutex_lock()
raw_spin_lock(&lock->wait_lock);
// Lock acquired
<NMI>
mutex_unlock()
<Release lock->owner>;
// Owner is free at this point so this succeeds
mutex_trylock();
// No kexec_crash_image
mutex_unlock()
raw_spin_lock(&lock->wait_lock);
>
> At this point I recommend going back to being ``unconventional'' with
> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> use a mutex for locking rather than xchg()").
>
> That would also mean that we don't have to worry about the lockdep code
> doing something weird in the future and breaking kexec.
>
> Your change starting to is atomic_cmpxchng is most halfway to a revert
> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> xchg()"). So we might as well go the whole way and just document that
> the kexec on panic code can not use conventional kernel locking
> primitives and has to dig deep and build it's own. At which point it
> makes no sense for the rest of the kexec code to use anything different.
>
Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
locking primitives to just where they are needed (loading & kexec'ing), but
I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.
> Eric