Re: net_tx_action race condition?
From: Eric Dumazet
Date: Wed Mar 28 2018 - 12:32:27 EST
On 03/28/2018 12:30 AM, Saurabh Kr wrote:
> Hi Eric/Angelo,
>
> We are seeing the assertion error in linux kernel 2.4.29 “*kernel: KERNEL: assertion (atomic_read(&skb->users) == 0) failed at dev.c(1397)**”.* Based on patch provided (_https://patchwork.kernel.org/patch/5368051/_ ) we merged the changes in linux kernel 2.4.29 but we are still facing the assertion error at dev.c (1397). Please let me know your thoughts.
>
> *Before Merge**(linux 2.4.29)*
> ---------------------------------
>
> static void net_tx_action(struct softirq_action *h)
> {
> int cpu = smp_processor_id();
>
> if (softnet_data[cpu].completion_queue) {
> struct sk_buff *clist;
>
> local_irq_disable();
> clist = softnet_data[cpu].completion_queue; // Existing code
> softnet_data[cpu].completion_queue = NULL;
> local_irq_enable();
>
> while (clist != NULL) {
> struct sk_buff *skb = clist;
> clist = clist->next;
>
> BUG_TRAP(atomic_read(&skb->users) == 0);
> __kfree_skb(skb);
> }
> }
>
> ---------
>
> *After Merge the changes based on available patch**(linux 2.4.29)**:*
> ------------------------------------------------------------------------------
>
> static void net_tx_action(struct softirq_action *h)
> {
> int cpu = smp_processor_id();
>
> if (softnet_data[cpu].completion_queue) {
> struct sk_buff *clist;
>
> local_irq_disable();
> clist = *(volatile typeof(softnet_data[cpu].completion_queue) *)&( softnet_data[cpu].completion_queue); // Modified line based on available patch
> softnet_data[cpu].completion_queue = NULL;
> local_irq_enable();
>
> while (clist != NULL) {
> struct sk_buff *skb = clist;
> clist = clist->next;
>
> BUG_TRAP(atomic_read(&skb->users) == 0);
> __kfree_skb(skb);
> }
> }
> ………….
>
> Thanks & regards,
> Saurabh
>
Thats simply prove (again) that this 'fix' was not the proper one.
I have no idea what is wrong, and there is no way I am going to look at 2.4.29 kernel...