RE: [PATCH/RFC] Lustre VFS patch

From: braam
Date: Mon May 24 2004 - 20:50:17 EST


Hi Jens,

We use this patch on servers as follows.

Lustre servers give an immediate response for RPC's to clients, and later
indicate what transactions numbers have been committed to disk. At known
points in the execution we sync all transactions to disk, execute our ioctl.
When the ioctl is issues Lustre is also instructed not to send disk commit
confirmation to clients. Then the system continues to execute some
transactions, but only in memory, and send responses to clients. We are
sure they are lost if we powercycle that system. This enables tests for
replay of transactions by client nodes in the cluster.

If we were to return errors, (which, I agree, _seems_ much more sane, and we
_did_ try that for a while!) then there is a good chance, namely immediately
when something is flushed to disk, that the system will detect the errors
and not continue to execute transactions making consistent testing of our
replay mechanisms impossible.

I hope that this explains why we do not return errors. Now if you tell me
that I can turn off I/O, and not get errors, with existing ioctls then I
certainly should existing ioctls. Can you clarify that.

Am I making sense to you now?

- Peter -


> -----Original Message-----
> From: Jens Axboe [mailto:axboe@xxxxxxx]
> Sent: Monday, May 24, 2004 7:47 PM
> To: Peter J. Braam
> Cc: torvalds@xxxxxxxx; akpm@xxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; 'Phil Schwan'
> Subject: Re: [PATCH/RFC] Lustre VFS patch
>
> On Mon, May 24 2004, Peter J. Braam wrote:
> > dev_read_only-vanilla-2.6.patch
> >
> > This introduces an ioctl on block devices to stop doing I/O. The
> > only purpose of the patch is automated recovery
> regression testing,
> > it is very convenient to have this available.
>
> You still keep pushing this on, without having clarified why
> you can't just use the genhd functions for this instead of
> adding some array of block_devices. And while you fixed the
> bio->bi_rw check, you still don't indicate error when you
> call bio_endio() for this case. So submitter of that io
> thinks it completes successfully, while you just tossed it away.
> Irk.
>
> --
> Jens Axboe
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/