Re: [RFC][PATCH] MD RAID Acceleration: Move stripe operations outsidethe spin lock

From: Neil Brown
Date: Thu May 18 2006 - 22:28:41 EST

Next message: Dave Peterson: "Re: [PATCH] mm: avoid unnecessary OOM kills"
Previous message: Brice Goglin: "Re: [PATCH 3/4] myri10ge - Driver core"
In reply to: Dan Williams: "[RFC][PATCH] MD RAID Acceleration: Move stripe operations outsidethe spin lock"
Next in thread: Dan Williams: "Re: [RFC][PATCH] MD RAID Acceleration: Move stripe operations outside the spin lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday May 16, dan.j.williams@xxxxxxxxx wrote:
> This is the second revision of the effort to enable offload of MD's xor
> and copy operations to dedicated hardware resources. Please comment on
> the approach of this patch and whether it will be suitable to expand
> this to the other areas in handle_stripe where such calculations are
> performed. Implementation of the xor offload API is a work in progress,
> the intent is to reuse I/OAT.
>
> Overview:
> Neil, as you recommended, this implementation flags the necessary
> operations on a stripe and then queues the execution to a separate
> thread (similar to how disk cycles are handled). See the comments added
> to raid5.h for more details.

Hi.

This certainly looks like it is heading in the right direction - thanks.

I have a couple of questions, which will probably lead to more.

You obviously need some state-machine functionality to oversee the
progression like xor - drain - xor (for RMW) or clear - copy - xor
(for RCW).
You have encoded the different states on a per block basis (storing it
in sh->dev[x].flags) rather than on a per-strip basis (and so encoding
it in sh->state).
What was the reason for this choice?

The only reason I can think of is to allow more parallelism :
different blocks within a strip can be in different states. I cannot
see any value in this as the 'xor' operation will work across all
(active) blocks in the strip and so you will have to synchronise all
the blocks on that operation.

I feel the code would be simpler if the state was in the strip rather
than the block.

The wait_for_block_op queue and it's usage seems odd to me.
handle_stripe should never have to wait for anything.
when a block_op is started, the sh->count should be incremented, and
the decremented when the block-ops have finished. Only when will
handle_stripe get to look at the stripe_head again. So waiting
shouldn't be needed.

Your GFP_KERNEL kmalloc is a definite no-no which can lead to
deadlocks (it might block while trying to write data out thought the
same raid array). At least it should be GFP_NOIO.
However a better solution would be to create and use a mempool - they
are designed for exactly this sort of usage.
However I'm not sure if even that is needed. Can a stripe_head have
move than one active block_ops task happening? If not, the
'stripe_work' should be embedded in the 'stripe_head'.

There will probably be more questions once these are answered, but as
the code is definitely a good start.

Thanks,
NeilBrown

(*) Since reading the DDF documentation again, I've realised that
using the word 'stripe' both for a chunk-wide stripe and a block-wide
stripe is very confusing. So I've started using 'strip' for a
block-wide stripe. Thus a 'stripe_head' should really be a
'strip_head'.

I hope this doesn't end up being even more confusing :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Peterson: "Re: [PATCH] mm: avoid unnecessary OOM kills"
Previous message: Brice Goglin: "Re: [PATCH 3/4] myri10ge - Driver core"
In reply to: Dan Williams: "[RFC][PATCH] MD RAID Acceleration: Move stripe operations outsidethe spin lock"
Next in thread: Dan Williams: "Re: [RFC][PATCH] MD RAID Acceleration: Move stripe operations outside the spin lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]