Re: [PATCH] lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization

From: Waiman Long
Date: Tue Mar 25 2025 - 19:20:58 EST


On 3/25/25 3:42 PM, Boqun Feng wrote:
On Tue, Mar 25, 2025 at 03:23:30PM -0400, Waiman Long wrote:
[...]
That commit seemed fixing a race between disabling lockdep and
unregistering key, and most importantly, call zap_class() for the
unregistered key even if lockdep is disabled (debug_locks = 0). It might
be related, but I'm not sure that's the reason of putting
synchronize_rcu() there. Say you want to synchronize between
/proc/lockdep and lockdep_unregister_key(), and you have
synchronize_rcu() in lockdep_unregister_key(), what's the RCU read-side
critical section at /proc/lockdep?
I agree that the commit that I mentioned is not relevant to the current
case. You are right that is_dynamic_key() is the only function that is
problematic, the other two are protected by the lockdep_lock. So they are
safe. Anyway, I believe that the actual race happens in the iteration of the
hashed list in is_dynamic_key(). The key that you save in the
lockdep_key_hazptr in your proposed patch should never be the key (dead_key)
The key stored in lockdep_key_hazptr is the one that the rest of the
function will use after is_dynamic_key() return true. That is,

CPU 0 CPU 1
===== =====
WRITE_ONCE(*lockdep_key_hazptr, key);
smp_mb();

is_dynamic_key():
hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
if (k == key) {
found = true;
break;
}
}
lockdep_unregister_key():
hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
if (k == key) {
hlist_del_rcu(&k->hash_entry);
found = true;
break;
}
}

smp_mb();

synchronize_lockdep_key_hazptr():
for_each_possible_cpu(cpu) {
<wait for the hazptr slot on
that CPU to be not equal to
the removed key>
}


, so that if is_dynamic_key() finds a key was in the hash, even though
later on the key would be removed by lockdep_unregister_key(), the
hazard pointers guarantee lockdep_unregister_key() would wait for the
hazard pointer to release.

that is passed to lockdep_unregister_key(). In is_dynamic_key():

    hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
                if (k == key) {
                        found = true;
                        break;
                }
        }

key != k (dead_key), but before accessing its content to get to hash_entry,
It is the dead_key.

an interrupt/NMI can happen. In the mean time, the structure holding the key
is freed and its content can be overwritten with some garbage. When
interrupt/NMI returns, hash_entry can point to anything leading to crash or
an infinite loop.  Perhaps we can use some kind of synchronization mechanism
No, hash_entry cannot be freed or overwritten because the user has
protect the key with hazard pointers, only when the user reset the
hazard pointer to NULL, lockdep_unregister_key() can then return and the
key can be freed.

between is_dynamic_key() and lockdep_unregister_key() to prevent this kind
of racing. For example, we can have an atomic counter associated with each
The hazard pointer I proposed provides the exact synchronization ;-)
What I am saying is that register_lock_class() is trying to find a newkey
while lockdep_unregister_key() is trying to take out an oldkey (newkey !=
oldkey). If they happens in the same hash list, register_lock_class() will
put newkey into the hazard pointer, but synchronize_lockdep_key_hazptr()
call from lockdep_unregister_key() is checking for oldkey which is not the
one saved in the hazard pointer. So lockdep_unregister_key() will return and
the key will be freed and reused while is_dynamic_key() may just have a
reference to the oldkey and trying to access its content which is invalid. I
think this is a possible scenario.

Oh, I see. And yes, the hazard pointers I proposed cannot prevent this
race unfortunately. (Well, technically we can still use an extra slot to
hold the key in the hash list iteration, but that would generates a lot
of stores, so it won't be ideal). But...

[...]
head of the hashed table. is_dynamic_key() can increment the counter if it
is not zero to proceed and lockdep_unregister_key() have to make sure it can
safely decrement the counter to 0 before going ahead. Just a thought!

Your idea inspires another solution with hazard pointers, we can
put the pointer of the hash_head into the hazard pointer slot ;-)

in register_lock_class():

/* hazptr: protect the key */
WRITE_ONCE(*key_hazptr, keyhashentry(lock->key));

/* Synchronizes with the smp_mb() in synchronize_lockdep_key_hazptr() */
smp_mb();

if (!static_obj(lock->key) && !is_dynamic_key(lock->key)) {
return NULL;
}

in lockdep_unregister_key():

/* Wait until register_lock_class() has finished accessing k->hash_entry. */
synchronize_lockdep_key_hazptr(keyhashentry(key));


which is more space efficient than per hash list atomic or lock unless
you have 1024 or more CPUs.

It looks like you are trying hard to find a use case for hazard pointer in the kernel :-)

Anyway, that may work. The only problem that I see is the issue of nesting of an interrupt context on top of a task context. It is possible that the first use of a raw_spinlock may happen in an interrupt context. If the interrupt happens when the task has set the hazard pointer and iterating the hash list, the value of the hazard pointer may be overwritten. Alternatively we could have multiple slots for the hazard pointer, but that will make the code more complicated. Or we could disable interrupt before setting the hazard pointer.

The solution that I am thinking about is to have a simple unfair rwlock to protect just the hash list iteration. lockdep_unregister_key() and lockdep_register_key() take the write lock with interrupt disabled. While is_dynamic_key() takes the read lock. Nesting in this case isn't a problem and we don't need RCU to protect the iteration process and so the last synchronize_rcu() call isn't needed. The level of contention should be low enough that live lock isn't an issue.

Cheers,
Longman