Re: [syzbot] [net?] INFO: rcu detected stall in packet_release

From: Vladimir Oltean
Date: Tue May 28 2024 - 08:55:45 EST


On Tue, May 28, 2024 at 02:25:58PM +0200, Radoslaw Zielonek wrote:
> Hello,
>
> I'm working on similar taprio bug:
> https://syzkaller.appspot.com/bug?extid=c4c6c3dc10cc96bcf723

Could you please let me know if the patches I posted yesterday fix that?
https://lore.kernel.org/netdev/20240527153955.553333-1-vladimir.oltean@xxxxxxx/

> I think I know what is the root cause.
>
> The function advance_sched()
> [https://elixir.bootlin.com/linux/v5.10.173/source/net/sched/sch_taprio.c#L696]
> runs repeatedly. It is executed using HRTimer.
> In every call to advance_sched(), end_time is calculated,
> and the timer is set so that the next execution will be at end_time.
> To achieve this, first, the expiration time is set using hrtimer_set_expires(),
> and second, HRTIMER_RESTART is returned.
> This means that the timer is re-enqueued with the adjusted expiration time.
> The issue is that end_time is set far before the current time (now),
> causing advance_sched() to execute immediately without a context switch.
>
> __hrtimer_run_queues()
> [https://elixir.bootlin.com/linux/v5.10.173/source/kernel/time/hrtimer.c#L1615]
> is a function with a long loop.
> First, please note that now is calculated once and not updated within this function.
> We can see the statement basenow = now + base->offset,
> but this statement is outside the loop (and in my case, the offset is 0).
> The loop will terminate when the queue is empty or the next entry in the queue
> has an expiration time in the future.
> The issue here is that the queue can be updated within __run_timer().
> In my case, __run_timer() adds a new entry to the queue with advance_sched() function.
> Since the expiration time is before now, we need to execute advance_sched() again.
> The loop is very long because, in our case, the cycle is set to 3ns.

In plain English, the root cause is "the schedule is too tight for the
CPU to keep up with it". Although a schedule with a 3 ns cycle time is
not practically valid in itself, either. Vinicius proposed we should
just reject the cycles that are unrealistically small, using some
simplistic heuristic about the transmission time of a single small
packet. The problem is that the rejection mechanism was slightly broken.

> My idea is to create throttling mechanism.
> When advance_sched() sets the hrtimer expiration time to before the current time
> for X consecutive times, we can postpone the new advance_sched().
> You can see my PoC here: https://lore.kernel.org/all/00000000000089...@xxxxxxxxxx/T/

The link is not valid. Can you repost it without the "..."?

> Could you take a look at it? What do you think?
> Is it acceptable, or is it too aggressive with too much impact on the TAPRIO scheduler?