Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

From: Eric Wheeler
Date: Thu Dec 05 2019 - 19:04:36 EST


On Tue, 3 Dec 2019, Coly Li wrote:

> On 2019/12/3 3:34 äå, Eric Wheeler wrote:
> > On Mon, 2 Dec 2019, Coly Li wrote:
> >> On 2019/12/2 6:24 äå, kungf wrote:
> >>> data may lost when in the follow scene of writeback mode:
> >>> 1. client write data1 to bcache
> >>> 2. client fdatasync
> >>> 3. bcache flush cache set and backing device
> >>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> >>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> >>> So data1 was not guaranteed in non-volatile storage, it may lost if power interruptionÂ
> >>>
> >>
> >> Hi,
> >>
> >> Do you encounter such problem in real work load ? With bcache journal, I
> >> don't see the possibility of data lost with your description.
> >>
> >> Correct me if I am wrong.
> >>
> >> Coly Li
> >
> > If this does become necessary, then we should have a sysfs or superblock
> > flag to disable FUA for those with RAID BBUs.
>
> Hi Eric,
>
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

I agree. FWIW, I just learned about /sys/block/sdX/queue/write_cache from
Nikos Tsironis <ntsironis@xxxxxxxxxxx>. Thus, my flag request for a FUA
bypass isn't necessary anyway, even if you did want an FUA there, because
FUAs are stripped when a blockdev is set to "write back" (QUEUE_FLAG_WC).

----------------------------------------------------------------------
This happens in generic_make_request_checks():

/*
* Filter flush bio's early so that make_request based
* drivers without flush support don't have to worry
* about them.
*/
if (op_is_flush(bio->bi_opf) &&
!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
if (!nr_sectors) {
status = BLK_STS_OK;
goto end_io;
}
}
----------------------------------------------------------------------

-Eric

>
> Coly Li
>
> >
> >>> Signed-off-by: kungf <wings.wyang@xxxxxxxxx>
> >>> ---
> >>> drivers/md/bcache/writeback.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> >>> index 4a40f9eadeaf..e5cecb60569e 100644
> >>> --- a/drivers/md/bcache/writeback.c
> >>> +++ b/drivers/md/bcache/writeback.c
> >>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >>> */
> >>> if (KEY_DIRTY(&w->key)) {
> >>> dirty_init(w);
> >>> - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> >>> + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >>> io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >>> bio_set_dev(&io->bio, io->dc->bdev);
> >>> io->bio.bi_end_io = dirty_endio;
> >>>
> >>
>