Re: [PATCH 006 of 13] md: Infrastructure to allow normal IO tocontinue while array is expanding.

From: Neil Brown
Date: Fri Mar 17 2006 - 01:15:44 EST


On Thursday March 16, akpm@xxxxxxxx wrote:
> NeilBrown <neilb@xxxxxxx> wrote:
> >
> > - retry:
> > prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> > - sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> > + sh = get_active_stripe(conf, new_sector, disks, pd_idx, (bi->bi_rw&RWA_MASK));
> > if (sh) {
> > - if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > - /* Add failed due to overlap. Flush everything
> > + if (unlikely(conf->expand_progress != MaxSector)) {
> > + /* expansion might have moved on while waiting for a
> > + * stripe, so we much do the range check again.
> > + */
> > + int must_retry = 0;
> > + spin_lock_irq(&conf->device_lock);
> > + if (logical_sector < conf->expand_progress &&
> > + disks == conf->previous_raid_disks)
> > + /* mismatch, need to try again */
> > + must_retry = 1;
> > + spin_unlock_irq(&conf->device_lock);
> > + if (must_retry) {
> > + release_stripe(sh);
> > + goto retry;
> > + }
> > + }
>
> The locking in here looks strange. We take the lock, do some arithmetic
> and some tests and then drop the lock again. Is it not possible that the
> result of those tests now becomes invalid?

Obviously another comment missing.
conf->expand_progress is sector_t and so could be 64bits on a 32 bit
platform, and so I cannot be sure it is updated atomically. So I
always access it within a lock (unless I am comparing for equality with ~0).

Yes, the result can become invalid, but only in one direction: As
expand_progress always increases, it is possible that it will pass
logical_sector. When that happens, STRIPE_EXPANDING gets set on the
stripe_head at logical_sector.
So because we took a reference to logical_sector *before* this test,
and check for stripe_expanding *after* the test, we can easily catch
that transition.

Putting it another way, this test is to catch cases where
logical_sector is a long way from expand_progress, the subsequent
test of STRIPE_EXPANDING catches cases where they are close together,
and the ordering wrt get_stripe_active ensures there are no holes.

Now to put that into a few short-but-clear comments.

Thanks again!

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/