Re: net_tx_action race condition?

From: Eric Dumazet
Date: Wed Mar 28 2018 - 12:32:27 EST




On 03/28/2018 12:30 AM, Saurabh Kr wrote:
> Hi Eric/Angelo,
>  
> We are seeing the assertion error  in linux kernel 2.4.29  “*kernel: KERNEL: assertion (atomic_read(&skb->users) == 0) failed at dev.c(1397)**”.* Based on patch provided (_https://patchwork.kernel.org/patch/5368051/_ ) we merged the changes in linux kernel 2.4.29 but we are still facing the assertion error at dev.c (1397). Please let me know your thoughts.
>  
> *Before Merge**(linux 2.4.29)*
> ---------------------------------
>  
> static void net_tx_action(struct softirq_action *h)
> {
>         int cpu = smp_processor_id();
>  
>         if (softnet_data[cpu].completion_queue) {
>                 struct sk_buff *clist;
>  
>                 local_irq_disable();
>                 clist = softnet_data[cpu].completion_queue; // Existing code
>                 softnet_data[cpu].completion_queue = NULL;
>                 local_irq_enable();
>  
>                 while (clist != NULL) {
>                         struct sk_buff *skb = clist;
>                         clist = clist->next;
>  
>                         BUG_TRAP(atomic_read(&skb->users) == 0);
>                         __kfree_skb(skb);
>                 }
>         }
>  
>          ---------
>  
> *After Merge the changes based on available patch**(linux 2.4.29)**:*
> ------------------------------------------------------------------------------
>  
> static void net_tx_action(struct softirq_action *h)
> {
>         int cpu = smp_processor_id();
>  
>         if (softnet_data[cpu].completion_queue) {
>                 struct sk_buff *clist;
>  
>                 local_irq_disable();
>                 clist = *(volatile typeof(softnet_data[cpu].completion_queue) *)&( softnet_data[cpu].completion_queue);  // Modified line based on available patch
>                 softnet_data[cpu].completion_queue = NULL;
>                 local_irq_enable();
>  
>                 while (clist != NULL) {
>                         struct sk_buff *skb = clist;
>                         clist = clist->next;
>  
>                         BUG_TRAP(atomic_read(&skb->users) == 0);
>                         __kfree_skb(skb);
>                 }
>         }
>   ………….
>  
> Thanks & regards,
> Saurabh
>  

Thats simply prove (again) that this 'fix' was not the proper one.

I have no idea what is wrong, and there is no way I am going to look at 2.4.29 kernel...