Re: [RFC][PATCH 3/3] ipc/sem: Rework wakeup scheme

From: Manfred Spraul
Date: Thu Sep 15 2011 - 13:32:08 EST


Hi Peter,


On 09/14/2011 03:30 PM, Peter Zijlstra wrote:
This removes the home-brew busy-wait and the requirement to keep
preemption disabled.
In the initial mail of the patch series, you write:

Patch 3 converts sysv sems, and is broken


What is broken?


/**
* newary - Create a new semaphore set
@@ -406,51 +388,39 @@ static int try_atomic_semop (struct sem_
return result;
}

-/** wake_up_sem_queue_prepare(q, error): Prepare wake-up
+/** wake_up_sem_queue_prepare(wake_list, q, error): Prepare wake-up
+ * @wake_list: list to queue the to be woken task on
* @q: queue entry that must be signaled
* @error: Error value for the signal
*
* Prepare the wake-up of the queue entry q.
*/
-static void wake_up_sem_queue_prepare(struct list_head *pt,
+static void wake_up_sem_queue_prepare(struct wake_list_head *wake_list,
struct sem_queue *q, int error)
{
- if (list_empty(pt)) {
- /*
- * Hold preempt off so that we don't get preempted and have the
- * wakee busy-wait until we're scheduled back on.
- */
- preempt_disable();
- }
- q->status = IN_WAKEUP;
- q->pid = error;
+ struct task_struct *p = ACCESS_ONCE(q->sleeper);

- list_add_tail(&q->simple_list, pt);
+ get_task_struct(p);
+ q->status = error;
+ /*
+ * implies a full barrier
+ */
+ wake_list_add(wake_list, p);
+ put_task_struct(p);
}
I think the get_task_struct()/put_task_struct is not necessary:
Just do the wake_list_add() before writing q->status:
wake_list_add() is identical to list_add_tail(&q->simple_list, pt).
[except that it contains additional locking, which doesn't matter here]


/**
- * wake_up_sem_queue_do(pt) - do the actual wake-up
- * @pt: list of tasks to be woken up
+ * wake_up_sem_queue_do(wake_list) - do the actual wake-up
+ * @wake_list: list of tasks to be woken up
*
* Do the actual wake-up.
* The function is called without any locks held, thus the semaphore array
* could be destroyed already and the tasks can disappear as soon as the
* status is set to the actual return code.
*/
-static void wake_up_sem_queue_do(struct list_head *pt)
+static void wake_up_sem_queue_do(struct wake_list_head *wake_list)
{
- struct sem_queue *q, *t;
- int did_something;
-
- did_something = !list_empty(pt);
- list_for_each_entry_safe(q, t, pt, simple_list) {
- wake_up_process(q->sleeper);
- /* q can disappear immediately after writing q->status. */
- smp_wmb();
- q->status = q->pid;
- }
- if (did_something)
- preempt_enable();
+ wake_up_list(wake_list, TASK_ALL);
}
wake_up_list() calls wake_up_state() that calls try_to_wake_up().
try_to_wake_up() seems to return immediately when the state is TASK_DEAD.

That leaves: Is it safe to call wake_up_list() in parallel with do_exit()?
The current implementation avoids that.

--
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/