Re: [PATCH] blk-mq: Start to fix memory ordering...

From: Bart Van Assche
Date: Tue Sep 05 2017 - 10:52:24 EST


On Mon, 2017-09-04 at 11:09 +0200, Peter Zijlstra wrote:
> /*
> * Mark us as started and clear complete. Complete might have been
> * set if requeue raced with timeout, which then marked it as
> * complete. So be sure to clear complete again when we start
> * the request, otherwise we'll ignore the completion event.
> + *
> + * Ensure that ->deadline is visible before set STARTED, such that

It seems like there is something wrong with the grammar in the above
sentence? Did you perhaps mean "before we set STARTED"?

> + * blk_mq_check_expired() is guaranteed to observe our ->deadline
> + * when it observes STARTED.
> */
> - if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> - set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> - if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
> + smp_mb__before_atomic();
> + set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
> + /*
> + * Coherence order guarantees these consequtive stores to a
> + * singe variable propagate in the specified order. Thus the
> + * clear_bit() is ordered _after_ the set bit. See
> + * blk_mq_check_expired().
> + */
> clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
> + }

Is this new comment really useful? If you want to keep it please spell
"consecutive" correctly.

> if (q->dma_drain_size && blk_rq_bytes(rq)) {
> /*
> @@ -744,11 +751,20 @@ static void blk_mq_check_expired(struct
> struct request *rq, void *priv, bool reserved)
> {
> struct blk_mq_timeout_data *data = priv;
> + unsigned long deadline;
>
> if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> return;
>
> /*
> + * Ensures that if we see STARTED we must also see our
> + * up-to-date deadline, see blk_mq_start_request().
> + */
> + smp_rmb();
> +
> + deadline = READ_ONCE(rq->deaedline);

"deaedline" is a spelling error. Has this patch been tested?

> + /*
> * The rq being checked may have been freed and reallocated
> * out already here, we avoid this race by checking rq->deadline
> * and REQ_ATOM_COMPLETE flag together:
> @@ -761,10 +777,20 @@ static void blk_mq_check_expired(struct
> * and clearing the flag in blk_mq_start_request(), so
> * this rq won't be timed out too.
> */
> - if (time_after_eq(jiffies, rq->deadline)) {
> - if (!blk_mark_rq_complete(rq))
> + if (time_after_eq(jiffies, deadline)) {
> + if (!blk_mark_rq_complete(rq)) {
> + /*
> + * Relies on the implied MB from test_and_clear() to
> + * order the COMPLETE load against the STARTED load.
> + * Orders against the coherence order in
> + * blk_mq_start_request().
> + *
> + * This ensures that if we see !COMPLETE we must see
> + * STARTED and ignore this timeout.
> + */
> blk_mq_rq_timed_out(rq, reserved);
> - } else if (!data->next_set || time_after(data->next, rq->deadline)) {
> + }
> + } else if (!data->next_set || time_after(data->next, deadline)) {
> data->next = rq->deadline;
> data->next_set = 1;
> }

Apparently a READ_ONCE(rq->deadline) statement has been added but not all
rq->deadline reads have been changed into reads of the local variable
"deadline"? Was that really your intention?

Bart.