Re: resend, finish_wait with list_empty_careful maybe fragile or buggy

From: çæå
Date: Mon Mar 12 2018 - 10:27:17 EST

Next message: Andy Shevchenko: "Re: [PATCH v9] mmc: Export host capabilities to debugfs."
Previous message: Bartlomiej Zolnierkiewicz: "Re: [PATCH] video: fbdev: sis: avoid mismatched prototypes"
In reply to: çæå: "Re: resend, finish_wait with list_empty_careful maybe fragile or buggy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

+ CC Boqun
in case you are interested in this topic

Best Regards,
Trol

> Sorry, this is a resend because the previous one was messed
> up by my editor and hard to be read.
>
> void finish_wait(struct wait_queue_head *wq_head,
> struct wait_queue_entry *wq_entry)
> {
> ....
> -> if (!list_empty_careful(&wq_entry->entry)) {
> -> spin_lock_irqsave(&wq_head->lock, flags);
> -> list_del_init(&wq_entry->entry);
> -> spin_unlock_irqrestore(&wq_head->lock, flags);
> -> }
> }
>
> After careful look into the stop/wakeup code, I found the above
> code fragile or even buggy. This code was introduced at least 14 years
> ago and seems fragile or buggy now after years of study on SMP
> synchronization by us.
>
> I understand this code are being used a lot and no bug seems to emerge.
> But, as I'll explain, it depends on a lot of unreliable implementation details.
>
> Firstly,
>
> static inline int list_empty_careful(const struct list_head *head)
> {
> struct list_head *next = head->next;
> return (next == head) && (next == head->prev);
> }
>
> note that the read of head->next & head->prev is not protected by
> READ_ONCE. So the read is free to be optimized out entirely.
> Luckily, this optimization is hard for compilers now, since all other accesses
> are out of finish_wait. And still, GCC won't spit aligned accesses into multiple
> instructions so it is atomic so far.
>
> Secondly,
>
> if ( ! list_empty_careful(&wq_entry->entry) ) {
> [remove entry with spinning-lock]
> }
> [ends stack frame of the function calling finish_wait]
> [overwrites wq_entry with another frame]
>
> and
>
> __wake_up_common() -->
> [read wq_entry->func] -->
> [read wq_entry->flags] -->
> autoremove_wake_function() -->
> [remove wq_entry->entry from wait queue] -->
>
> are not properly ordered for SMP so that [read wq_entry->flags]
> may be reordered after [remove wq_entry->entry from wait queue]
> since no dependency or memory barrier forbids it. This may cause
> [overwrites wq_entry with another frame] on one CPU takes place
> before [read wq_entry->flags] on another CPU and cause
> [read wq_entry->flags] to return bad value.
>
> This behavior is not reported may thank to:
> - few code is using autoremove_wake_function
> - CPU pipeline is not as deep as to make this emerge
>
> Best regards,
> Trol

Next message: Andy Shevchenko: "Re: [PATCH v9] mmc: Export host capabilities to debugfs."
Previous message: Bartlomiej Zolnierkiewicz: "Re: [PATCH] video: fbdev: sis: avoid mismatched prototypes"
In reply to: çæå: "Re: resend, finish_wait with list_empty_careful maybe fragile or buggy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]