On Tue, Mar 25, 2025 at 10:52:16AM -0400, Waiman Long wrote:
On 3/24/25 11:41 PM, Boqun Feng wrote:The key stored in lockdep_key_hazptr is the one that the rest of the
On Mon, Mar 24, 2025 at 09:56:25PM -0400, Waiman Long wrote:I agree that the commit that I mentioned is not relevant to the current
On 3/24/25 8:47 PM, Boqun Feng wrote:This is not a UAF :(
On Mon, Mar 24, 2025 at 12:30:10PM -0700, Boqun Feng wrote:
On Mon, Mar 24, 2025 at 12:21:07PM -0700, Boqun Feng wrote:
On Mon, Mar 24, 2025 at 01:23:50PM +0100, Eric Dumazet wrote:
[...]
I feel a bit confusing even for the old comment, normally I would expect---
kernel/locking/lockdep.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4470680f02269..a79030ac36dd4 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -6595,8 +6595,10 @@ void lockdep_unregister_key(struct lock_class_key *key)
if (need_callback)
call_rcu(&delayed_free.rcu_head, free_zapped_rcu);
- /* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
- synchronize_rcu();
the caller of lockdep_unregister_key() should guarantee the key has been
unpublished, in other words, there is no way a lockdep_unregister_key()
could race with a register_lock_class()/lockdep_init_map_type(). The
synchronize_rcu() is not needed then.
Let's say someone breaks my assumption above, then when doing a
register_lock_class() with a key about to be unregister, I cannot see
anything stops the following:
CPU 0 CPU 1
===== =====
register_lock_class():
...
} else if (... && !is_dynamic_key(lock->key)) {
// ->key is not unregistered yet, so this branch is not
// taken.
return NULL;
}
lockdep_unregister_key(..);
// key unregister, can be free
// any time.
key = lock->key->subkeys + subclass; // BOOM! UAF.
You mean the lock_keys_hash table, right? I used register_lock_class()My understanding is that it is not a race between register_lock_class() andSince register_lock_class() will be call with irq disabled, maybe hazardSo either we don't need the synchronize_rcu() here or theOh! Maybe I was missing register_lock_class() must be called with irq
synchronize_rcu() doesn't help at all. Am I missing something subtle
here?
disabled, which is also an RCU read-side critical section.
pointers [1] is better because most of the case we only have nr_cpus
readers, so the potential hazard pointer slots are fixed.
So the below patch can reduce the time of the tc command from real ~1.7
second (v6.14) to real ~0.05 second (v6.14 + patch) in my test env,
which is not surprising given it's a dedicated hazard pointers for
lock_class_key.
Thoughts?
lockdep_unregister_key(). It is the fact that the structure that holds the
lock_class_key may be freed immediately after return from
lockdep_unregister_key(). So any processes that are in the process of
iterating the hash_list containing the hash_entry to be unregistered may hit
as an example, because it's one of the places that iterates
lock_keys_hash. IIUC lock_keys_hash is only used in
lockdep_{un,}register_key() and is_dynamic_key() (which are only called
by lockdep_init_map_type() and register_lock_class()).
a UAF problem. See commit 61cc4534b6550 ("locking/lockdep: Avoid potentialThat commit seemed fixing a race between disabling lockdep and
access of invalid memory in lock_class") for a discussion of this kind of
UAF problem.
unregistering key, and most importantly, call zap_class() for the
unregistered key even if lockdep is disabled (debug_locks = 0). It might
be related, but I'm not sure that's the reason of putting
synchronize_rcu() there. Say you want to synchronize between
/proc/lockdep and lockdep_unregister_key(), and you have
synchronize_rcu() in lockdep_unregister_key(), what's the RCU read-side
critical section at /proc/lockdep?
case. You are right that is_dynamic_key() is the only function that is
problematic, the other two are protected by the lockdep_lock. So they are
safe. Anyway, I believe that the actual race happens in the iteration of the
hashed list in is_dynamic_key(). The key that you save in the
lockdep_key_hazptr in your proposed patch should never be the key (dead_key)
function will use after is_dynamic_key() return true. That is,
CPU 0 CPU 1
===== =====
WRITE_ONCE(*lockdep_key_hazptr, key);
smp_mb();
is_dynamic_key():
hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
if (k == key) {
found = true;
break;
}
}
lockdep_unregister_key():
hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
if (k == key) {
hlist_del_rcu(&k->hash_entry);
found = true;
break;
}
}
smp_mb();
synchronize_lockdep_key_hazptr():
for_each_possible_cpu(cpu) {
<wait for the hazptr slot on
that CPU to be not equal to
the removed key>
}
, so that if is_dynamic_key() finds a key was in the hash, even though
later on the key would be removed by lockdep_unregister_key(), the
hazard pointers guarantee lockdep_unregister_key() would wait for the
hazard pointer to release.
that is passed to lockdep_unregister_key(). In is_dynamic_key():It is the dead_key.
hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
if (k == key) {
found = true;
break;
}
}
key != k (dead_key), but before accessing its content to get to hash_entry,
an interrupt/NMI can happen. In the mean time, the structure holding the keyNo, hash_entry cannot be freed or overwritten because the user has
is freed and its content can be overwritten with some garbage. When
interrupt/NMI returns, hash_entry can point to anything leading to crash or
an infinite loop. Perhaps we can use some kind of synchronization mechanism
protect the key with hazard pointers, only when the user reset the
hazard pointer to NULL, lockdep_unregister_key() can then return and the
key can be freed.
between is_dynamic_key() and lockdep_unregister_key() to prevent this kindThe hazard pointer I proposed provides the exact synchronization ;-)
of racing. For example, we can have an atomic counter associated with each
Regards,
Boqun
head of the hashed table. is_dynamic_key() can increment the counter if it
is not zero to proceed and lockdep_unregister_key() have to make sure it can
safely decrement the counter to 0 before going ahead. Just a thought!
Cheers,
Longman