Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

From: kungf
Date: Tue Dec 03 2019 - 06:40:12 EST


On Mon, 2 Dec 2019 19:08:59 +0800
Coly Li <colyli@xxxxxxx> wrote:

> On 2019/12/2 6:24 pm, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost if power interruption 
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li

Hi Coly?

Sorry to confuse you. As i known now, write_dirty function write dirty to backing without FUA, and write_dirty_finish make dirty key clean,
it means the data indexed by the key will not be writeback again, am i wrong?
I only find that the backing device will be flushed when bcache get an PREFLUSH bio, any other place it will be flushed in journal?

I made a test that write bcache with dd, and then detach it, blktrace the cache and backing device at the same time.
1. close writeback
# echo 0 > /sys/block/bcache0/bcache/writeback_running
2. write data with a fdatasync
#dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
3. detach and trigger writeback
#echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach

the blow text is blkparse result.
from cache blktrace blow, we can see 16k data write to cache set, and then flush with op FWFSM(PREFLUSH|WRITE|FUA|SYNC|META)
```
8,160 33 1 0.000000000 222844 A W 630609920 + 32 <- (8,167) 1464320
8,167 33 2 0.000000478 222844 Q W 630609920 + 32 [dd]
8,167 33 3 0.000006167 222844 G W 630609920 + 32 [dd]
8,167 33 5 0.000011385 222844 I W 630609920 + 32 [dd]
8,167 33 6 0.000023890 948 D W 630609920 + 32 [kworker/33:1H]
8,167 33 7 0.000111203 0 C W 630609920 + 32 [0]
8,160 34 1 0.000167029 215616 A FWFSM 629153808 + 8 <- (8,167) 8208
8,167 34 2 0.000167490 215616 Q FWFSM 629153808 + 8 [kworker/34:2]
8,167 34 3 0.000169061 215616 G FWFSM 629153808 + 8 [kworker/34:2]
8,167 34 4 0.000301308 949 D WFSM 629153808 + 8 [kworker/34:1H]
8,167 34 5 0.000348832 0 C WFSM 629153808 + 8 [0]
8,167 34 6 0.000349612 0 C WFSM 629153808 [0]
```

from backing blktrace blow, the backing device first get flush op FWS(PERFLUSH|WRITE|SYNC) because of we stop writeback, then get W op after detach,
the 16k data was writeback to backing device, and after this, the backing device never get flush op, it means that the 16k data we write it's not safe
in backing device, even we had write with fdatasync.
```
8,144 33 1 0.000000000 222844 Q WSM 8 + 8 [dd]
8,144 33 2 0.000016609 222844 G WSM 8 + 8 [dd]
8,144 33 5 0.000020710 222844 I WSM 8 + 8 [dd]
8,144 33 6 0.000031967 948 D WSM 8 + 8 [kworker/33:1H]
8,144 33 7 0.000152945 88631 C WS 16 + 32 [0]
8,144 34 1 0.000186127 215616 Q FWS [kworker/34:2]
8,144 34 2 0.000187006 215616 G FWS [kworker/34:2]
8,144 33 8 0.000326761 0 C WSM 8 + 8 [0]
8,144 34 3 0.020195027 0 C WS 16 [0]
8,144 34 4 0.020195904 0 C FWS 16 [0]
8,144 23 1 19.415130395 215884 Q W 16 + 32 [kworker/23:2]
8,144 23 2 19.415132072 215884 G W 16 + 32 [kworker/23:2]
8,144 23 3 19.415133134 215884 I W 16 + 32 [kworker/23:2]
8,144 23 4 19.415137776 1215 D W 16 + 32 [kworker/23:1H]
8,144 23 5 19.416607260 0 C W 16 + 32 [0]
8,144 24 1 19.416640754 222593 Q WSM 8 + 8 [bcache_writebac]
8,144 24 2 19.416642698 222593 G WSM 8 + 8 [bcache_writebac]
8,144 24 3 19.416643505 222593 I WSM 8 + 8 [bcache_writebac]
8,144 24 4 19.416650589 1107 D WSM 8 + 8 [kworker/24:1H]
8,144 24 5 19.416865258 0 C WSM 8 + 8 [0]
8,144 24 6 19.416871350 221889 Q WSM 8 + 8 [kworker/24:1]
8,144 24 7 19.416872201 221889 G WSM 8 + 8 [kworker/24:1]
8,144 24 8 19.416872542 221889 I WSM 8 + 8 [kworker/24:1]
8,144 24 9 19.416875458 1107 D WSM 8 + 8 [kworker/24:1H]
8,144 24 10 19.417076935 0 C WSM 8 + 8 [0]
```
>
> > Signed-off-by: kungf <wings.wyang@xxxxxxxxx>
> > ---
> > drivers/md/bcache/writeback.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> > */
> > if (KEY_DIRTY(&w->key)) {
> > dirty_init(w);
> > - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> > io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> > bio_set_dev(&io->bio, io->dc->bdev);
> > io->bio.bi_end_io = dirty_endio;
> >
>
--
kungf <wings.wyang@xxxxxxxxx>