Have you checked the change in the patch? Now call_rcu_zapped() has been
splitted into two parts: preparing the callback and calling call_rcu(),
the preparing part checks and sets the delayed_free.scheduled under
graph_lock(), so only one CPU/thread will win and do the actual
call_rcu(). And the RCU callback free_zapped_rcu() will unset
delayed_free.scheduled, again under graph_lock().
If you think it's still possible, could you provide a case where the
race may happen?