Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

From: Coly Li
Date: Tue Dec 03 2019 - 09:09:57 EST


On 2019/12/3 3:16 äå, kungf wrote:
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@xxxxxxx
> <mailto:colyli@xxxxxxx>> wrote:
>>
>> On 2019/12/2 6:24 äå, kungf wrote:
>> > data may lost when in the follow scene of writeback mode:
>> > 1. client write data1 to bcache
>> > 2. client fdatasync
>> > 3. bcache flush cache set and backing device
>> > if now data1 was not writed back to backing, it was only guaranteed
> safe in cache.
>> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>> > So data1 was not guaranteed in non-volatile storage, Âit may lost if
> Âpower interruption
>> >
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
>>
> Hi Coly,
>
> Sorry to confuse you. As i known now, write_dirty function write dirty
> to backing without FUAïand write_dirty_finish make dirty key clean,
> it means the data indexed by the key will not be writeback again, am i
> wrong?

Yes, you are right. This is the behavior as design. We don't guarantee
the data will be on always on platter, this is what most storage systems do.

> I only find that the backing device will be flushed when bcache get an
> PREFLUSH bio, any other place Âit will be flushed in journal?
>

Storage system flushes its buffer when upper layer requires, that means
if the application wants to make its writing data flushed on platter, it
should explicitly issue a flush request.

What you observe and test are all as designed IMHO. The I/O stack does
not guarantee any data persistent on storage media unless an explicit
flush request received from upper layer and returned to upper layer.

Coly Li

> I made a test that write bcache with ddïand then detach it, blktrace
> the cache and backing device at the same time.
> 1. close writeback
> # echo 0 > /sys/block/bcache0/bcache/writeback_running
> 2. write data with a fdatasync
> #dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
> 3. detach and trigger writeback
> #echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach
>
> the blow text is blkparse result.
> from cache blktrace blow, we can see 16k data write to cache set, and
> then flush with op FWFSM (PREFLUSH| WRITE| FUA|SYNC|META )
> ```
> Â 8,160 33 Â Â Â Â1 Â Â 0.000000000 222844 ÂA Â W 630609920 + 32 <-
> (8,167) 1464320
> Â 8,167 33 Â Â Â Â2 Â Â 0.000000478 222844 ÂQ Â W 630609920 + 32 [dd]
> Â 8,167 33 Â Â Â Â3 Â Â 0.000006167 222844 ÂG Â W 630609920 + 32 [dd]
> Â 8,167 33 Â Â Â Â5 Â Â 0.000011385 222844 ÂI Â W 630609920 + 32 [dd]
> Â 8,167 33 Â Â Â Â6 Â Â 0.000023890 Â 948 ÂD Â W 630609920 + 32
> [kworker/33:1H]
> Â 8,167 33 Â Â Â Â7 Â Â 0.000111203 Â Â 0 ÂC Â W 630609920 + 32 [0]
> Â 8,160 34 Â Â Â Â1 Â Â 0.000167029 215616 ÂA FWFSM 629153808 + 8 <-
> (8,167) 8208
> Â 8,167 34 Â Â Â Â2 Â Â 0.000167490 215616 ÂQ FWFSM 629153808 + 8
> [kworker/34:2]
> Â 8,167 34 Â Â Â Â3 Â Â 0.000169061 215616 ÂG FWFSM 629153808 + 8
> [kworker/34:2]
> Â 8,167 34 Â Â Â Â4 Â Â 0.000301308 Â 949 ÂD WFSM 629153808 + 8
> [kworker/34:1H]
> Â 8,167 34 Â Â Â Â5 Â Â 0.000348832 Â Â 0 ÂC WFSM 629153808 + 8 [0]
> Â 8,167 34 Â Â Â Â6 Â Â 0.000349612 Â Â 0 ÂC WFSM 629153808 [0]
> ```
>
> from backing blktrace blow, the backing device first get flush op FWS
> (PERFLUSH|WRITE|SYNC)Â because of we stop writeback, then get W op after
> detach,
> the 16k data was writeback to backing device, and after this, the
> backing device never get flush op, */it means that the 16k data we write
> it's not safe in backing/*
> */device, even we dd write with fdatasync./*
> ```
> Â 8,144 33 Â Â Â Â1 Â Â 0.000000000 222844 ÂQ WSM 8 + 8 [dd]
> Â 8,144 33 Â Â Â Â2 Â Â 0.000016609 222844 ÂG WSM 8 + 8 [dd]
> Â 8,144 33 Â Â Â Â5 Â Â 0.000020710 222844 ÂI WSM 8 + 8 [dd]
> Â 8,144 33 Â Â Â Â6 Â Â 0.000031967 Â 948 ÂD WSM 8 + 8 [kworker/33:1H]
> Â 8,144 33 Â Â Â Â7 Â Â 0.000152945 88631 ÂC ÂWS 16 + 32 [0]
> Â 8,144 34 Â Â Â Â1 Â Â 0.000186127 215616 ÂQ FWS [kworker/34:2]
> Â 8,144 34 Â Â Â Â2 Â Â 0.000187006 215616 ÂG FWS [kworker/34:2]
> Â 8,144 33 Â Â Â Â8 Â Â 0.000326761 Â Â 0 ÂC WSM 8 + 8 [0]
> Â 8,144 34 Â Â Â Â3 Â Â 0.020195027 Â Â 0 ÂC ÂWS 16 [0]
> Â 8,144 34 Â Â Â Â4 Â Â 0.020195904 Â Â 0 ÂC FWS 16 [0]
> Â 8,144 23 Â Â Â Â1 Â Â19.415130395 215884 ÂQ Â W 16 + 32 [kworker/23:2]
> Â 8,144 23 Â Â Â Â2 Â Â19.415132072 215884 ÂG Â W 16 + 32 [kworker/23:2]
> Â 8,144 23 Â Â Â Â3 Â Â19.415133134 215884 ÂI Â W 16 + 32 [kworker/23:2]
> Â 8,144 23 Â Â Â Â4 Â Â19.415137776 Â1215 ÂD Â W 16 + 32 [kworker/23:1H]
> Â 8,144 23 Â Â Â Â5 Â Â19.416607260 Â Â 0 ÂC Â W 16 + 32 [0]
> Â 8,144 24 Â Â Â Â1 Â Â19.416640754 222593 ÂQ WSM 8 + 8 [bcache_writebac]
> Â 8,144 24 Â Â Â Â2 Â Â19.416642698 222593 ÂG WSM 8 + 8 [bcache_writebac]
> Â 8,144 24 Â Â Â Â3 Â Â19.416643505 222593 ÂI WSM 8 + 8 [bcache_writebac]
> Â 8,144 24 Â Â Â Â4 Â Â19.416650589 Â1107 ÂD WSM 8 + 8 [kworker/24:1H]
> Â 8,144 24 Â Â Â Â5 Â Â19.416865258 Â Â 0 ÂC WSM 8 + 8 [0]
> Â 8,144 24 Â Â Â Â6 Â Â19.416871350 221889 ÂQ WSM 8 + 8 [kworker/24:1]
> Â 8,144 24 Â Â Â Â7 Â Â19.416872201 221889 ÂG WSM 8 + 8 [kworker/24:1]
> Â 8,144 24 Â Â Â Â8 Â Â19.416872542 221889 ÂI WSM 8 + 8 [kworker/24:1]
> Â 8,144 24 Â Â Â Â9 Â Â19.416875458 Â1107 ÂD WSM 8 + 8 [kworker/24:1H]
> Â 8,144 24 Â Â Â 10 Â Â19.417076935 Â Â 0 ÂC WSM 8 + 8 [0]
> ```
>
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@xxxxxxx
> <mailto:colyli@xxxxxxx>> wrote:
>
> On 2019/12/2 6:24 äå, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only
> guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost
> if power interruptionÂ
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li
>
> > Signed-off-by: kungf <wings.wyang@xxxxxxxxx
> <mailto:wings.wyang@xxxxxxxxx>>
> > ---
> >Â drivers/md/bcache/writeback.c | 2 +-
> >Â 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c
> b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >Â Â Â Â */
> >Â Â Â Âif (KEY_DIRTY(&w->key)) {
> >Â Â Â Â Â Â Â Âdirty_init(w);
> > -Â Â Â Â Â Â Âbio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > +Â Â Â Â Â Â Âbio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >Â Â Â Â Â Â Â Âio->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >Â Â Â Â Â Â Â Âbio_set_dev(&io->bio, io->dc->bdev);
> >       Âio->bio.bi_end_io   Â= dirty_endio;
> >
>