Re: [net/sched] Question: Locks for clearing ERR_PTR() value from idrinfo->action_idr ?

From: Pedro Tammela
Date: Thu Jun 13 2024 - 22:51:24 EST


On 13/06/2024 21:58, Tetsuo Handa wrote:

Is there a possibility that tcf_idr_check_alloc() is called without holding
rtnl_mutex?

There is, but not in the code path of this reproducer.

If yes, adding a sleep before "goto again;" would help. But if no,
is this a sign that some path forgot to call tcf_idr_{cleanup,insert_many}() ?

The reproducer is sending a new action message with 2 actions.
Actions are committed to the idr after processing in order to make them visible together and after any errors are caught.

The bug happens when the actions in the message refer to the same index. Since the first processing succeeds, adding -EBUSY to the index, the second processing, which references the same index, will loop forever.

After the change to rely on RCU for this check, instead of the idr lock, the hangs became more noticeable to syzbot since now it's hanging a system-wide lock.