Re: bug in md write barrier support?

From: Jens Axboe
Date: Wed Sep 08 2004 - 04:28:21 EST


On Mon, Sep 06 2004, Neil Brown wrote:
> On Saturday September 4, axboe@xxxxxxx wrote:
> > On Sat, Sep 04 2004, Neil Brown wrote:
> > > On Friday September 3, hch@xxxxxx wrote:
> > > > md_flush_mddev just passes on the sector relative to the raid device,
> > > > shouldn't it be translated somewhere?
> > >
> > > Yes. md_flush_mddev should simply be removed.
> > > The functionality should be, and largely is, in the individual
> > > personalities.
> >
> > Yes, sorry I was a little lazy there even though I followed the plugging
> > conversion :(
> >
> > > Is there documentation somewhere on exactly what an issue_flush_fn
> > > should do (is it allowed to sleep? what must happen before it is
> > > allowed to return, what is the "error_sector" for, that sort of thing).
> >
> > It is allowed to sleep, you should return when the flush is complete.
> > error_sector is the failed location, which really should be a dev,sector
> > tupple.
>
> Could I get a little more information about this function please.
> I've read through the code, and there isn't much in the way of
> examples to follow: only reiserfs uses it, only scsi-disk and ide-disk
> supports it (I think).

That is correct. The current definition is to ensure that previously
sent writes are on disk. I hope to tie a range to it in the future, for
devices that can optimize the flush in that case. So for ide with write
back caching, it's currently a FLUSH_CACHE command. Ditto for SCSI. SCSI
with write through cache can make it a noop as well.

> It would seem that this is for write requests where b_end_io has already
> been called, indicating that the data is safe, but that maybe the data
> isn't really safe after-all, and blk_issue_flush needs to be called.

Right on.

> I would have thought that after b_end_io is called, that data should
> be safe anyway. Not so?

Not necessarily, if you have write caching enabled.

> How do you tell a device: it is OK to just leave the data is cache,
> I'll call blk_issue_flush when I want it safe.

How would md know? The lower level driver knows what to do (if anything)
to ensure the data is safe.

> Is this related to barriers are all??

Yes and no. Currently it's used to fsync(), but can be used for
anything where you want to insert a flush point without having a piece
of data to tie it to.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/