Re: BUG: aio/direct-io data corruption in 4.7

From: Gregory Shapiro
Date: Tue Nov 06 2018 - 06:31:59 EST


Hi Jack,
I tested it in 4.9.102 and I checked the latest code from elixir
(versions 4.19 and 4.20) and the error in code is still present there.
More on the scenario and the bug:
I experienced data corruption in my application (nvme based storage).
The issue was caused because of faulty hardware, but the real problem
is I got a correct number of bytes in io_getevents thus couldn't
recognize it correctly the error.
Looking at the /var/log/messages and I saw the following errors in
time of coruption:

Oct 11 14:55:15 block01-node05 kernel: [19272.951015]
blk_update_request: I/O error, dev nvme2n3, sector 117359360
Oct 11 14:55:15 block01-node05 kernel: [19272.952786]
blk_update_request: I/O error, dev nvme2n3, sector 117359872
Oct 11 14:55:16 block01-node05 kernel: [19273.544374]
blk_update_request: I/O error, dev nvme2n3, sector 117360384
...
So the block level does receive information about the error, but I
don't see it in the application.
running ftrace and doing code reading I find out that dio error status
is overridden.
In dio_complete it is propagated in (dio->io_error and if
dio->io_error is not zero in we are in async write
the status is overridden by transferred.

static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
{
...
if (ret == 0)
ret = dio->page_errors;
if (ret == 0)
ret = dio->io_error;
if (ret == 0)
ret = transferred;
...
if (is_async) {
/*
* generic_write_sync expects ki_pos to have been updated
* already, but the submission path only does this for
* synchronous I/O.
*/
dio->iocb->ki_pos += transferred;

if (dio->op == REQ_OP_WRITE)
ret = generic_write_sync(dio->iocb, transferred);
dio->iocb->ki_complete(dio->iocb, ret, 0);



For your convenience I am attaching ftrace log to for easier tracking
the flow in the code:


26) | nvme_complete_rq [nvme_core]() {
26) | blk_mq_end_request() {
26) | blk_update_request() { <----
log is from here
26) 0.563 us | blk_account_io_completion();
26) 0.263 us | bio_advance();
26) | bio_endio() {
26) | dio_bio_end_aio() {
26) | dio_bio_complete() {
26) | bio_check_pages_dirty() {
26) | bio_put() {
26) | bio_free() {
26) | __bio_free() {
26) 0.045 us | bio_disassociate_task();
26) 0.497 us | }
26) 0.042 us | bvec_free();
26) | mempool_free() {
26) | mempool_free_slab() {
26) 0.264 us | kmem_cache_free();
26) 0.606 us | }
26) 1.125 us | }
26) 2.588 us | }
26) 2.920 us | }
26) 3.979 us | }
26) 4.712 us | }
26) 0.040 us | _raw_spin_lock_irqsave();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | dio_complete() {
dio_complete(dio, 0, true);
26) | aio_complete() {
dio->iocb->ki_complete(dio->iocb, ret, 0); <<aio_complete(struct kiocb
*kiocb, long res, long res2)>>
26) 0.073 us | _raw_spin_lock_irqsave();
26) 0.114 us | refill_reqs_available();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | kiocb_free() {
26) 0.171 us | fput();
26) 0.102 us | kmem_cache_free();
26) 0.902 us | }


}

On Tue, Nov 6, 2018 at 9:29 AM Jack Wang <jack.wang.usish@xxxxxxxxx> wrote:
>
> Gregory Shapiro <shapiro.gregory@xxxxxxxxx> ä2018å11æ5æåä äå4:19åéï
> >
> > Hello, my name is Gregory Shapiro and I am a newbie on this list.
> > I recently encountered data corruption as I got a kernel to
> > acknowledge write ("io_getevents" system call with a correct number of
> > bytes) but undergoing write to disk failed.
> > After investigating the problem I found it is identical to issue found
> > in direct-io.c mentioned the bellow thread.
> > https://lore.kernel.org/lkml/20160921141539.GA17898@xxxxxxxxxxxxx/
> > Is there a reason proposed patch didn't apply to the kernel?
> > When can I expect it to be applied?
> > Thanks,
> > Gregory
>
> Hi Gregory,
>
> Thanks for your info.
> Have you tried with latest kernel other than 4.7, is the problem still there?
>
> Could you share your test case?
>
> Regards,
> Jack Wang