Re: [dm-devel] DM Regression in 4.16-rc1 - read() returns data when it shouldn't
From: NeilBrown
Date: Wed Feb 14 2018 - 19:07:34 EST
On Wed, Feb 14 2018, Mike Snitzer wrote:
> On Wed, Feb 14 2018 at 3:39pm -0500,
> NeilBrown <neilb@xxxxxxxx> wrote:
>
>> On Wed, Feb 14 2018, Milan Broz wrote:
>>
>> > Hi,
>> >
>> > the commit (found by bisect)
>> >
>> > commit 18a25da84354c6bb655320de6072c00eda6eb602
>> > Author: NeilBrown <neilb@xxxxxxxx>
>> > Date: Wed Sep 6 09:43:28 2017 +1000
>> >
>> > dm: ensure bio submission follows a depth-first tree walk
>> >
>> > cause serious regression while reading from DM device.
>> >
>> > The reproducer is below, basically it tries to simulate failure we see in cryptsetup
>> > regression test: we have DM device with error and zero target and try to read
>> > "passphrase" from it (it is test for 64 bit offset error path):
>> >
>> > Test device:
>> > # dmsetup table test
>> > 0 10000000 error
>> > 10000000 1000000 zero
>> >
>> > We try to run this operation:
>> > lseek64(fd, 5119999988, SEEK_CUR); // this should seek to error target sector
>> > read(fd, buf, 13); // this should fail, if we seek to error part of the device
>> >
>> > While on 4.15 the read properly fails:
>> > Seek returned 5119999988.
>> > Read returned -1.
>> >
>> > for 4.16 it actually succeeds returning some random data
>> > (perhaps kernel memory, so this bug is even more dangerous):
>> > Seek returned 5119999988.
>> > Read returned 13.
>> >
>> > Full reproducer below:
>> >
>> > #define _GNU_SOURCE
>> > #define _LARGEFILE64_SOURCE
>> > #include <stdio.h>
>> > #include <stddef.h>
>> > #include <stdint.h>
>> > #include <stdlib.h>
>> > #include <unistd.h>
>> > #include <fcntl.h>
>> > #include <inttypes.h>
>> >
>> > int main (int argc, char *argv[])
>> > {
>> > char buf[13];
>> > int fd;
>> > //uint64_t offset64 = 5119999999;
>> > uint64_t offset64 = 5119999988;
>> > off64_t r;
>> > ssize_t bytes;
>> >
>> > system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test");
>> >
>> > fd = open("/dev/mapper/test", O_RDONLY);
>> > if (fd == -1) {
>> > printf("open fail\n");
>> > return 1;
>> > }
>> >
>> > r = lseek64(fd, offset64, SEEK_CUR);
>> > printf("Seek returned %" PRIu64 ".\n", r);
>> > if (r < 0) {
>> > printf("seek fail\n");
>> > close(fd);
>> > return 2;
>> > }
>> >
>> > bytes = read(fd, buf, 13);
>> > printf("Read returned %d.\n", (int)bytes);
>> >
>> > close(fd);
>> > return 0;
>> > }
>> >
>> >
>> > Please let me know if you need more info to reproduce it.
>>
>> Thanks for the detailed report. I haven't tried to reproduce, but the
>> code looks very weird.
>> The patch I posted "Date: Wed, 06 Sep 2017 09:43:28 +1000" had:
>> + struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>> + md->queue->bio_split);
>> + bio_advance(b, (bio_sectors(b) - ci.sector_count) << 9);
>> + bio_chain(b, bio);
>> + generic_make_request(b);
>> + break;
>>
>> The code in Linux has:
>>
>> struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
>> md->queue->bio_split);
>> ci.io->orig_bio = b;
>> bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
>> bio_chain(b, bio);
>> ret = generic_make_request(bio);
>> break;
>>
>> So the wrong bio is sent to generic_make_request().
>> Mike: do you remember how that change happened? I think there were
>> discussions following the patch, but I cannot find anything about making
>> that change.
>
> Mikulas had this feedback:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00159.html
>
> You replied with:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00165.html
>
> Where you said "Yes, you are right something like that would be better."
>
> And you then provided a follow-up patch:
> https://www.redhat.com/archives/dm-devel/2017-November/msg00175.html
>
> That we discussed and I said I'd just fold it into the original, and you
> agreed and thanked me for checking with you ;)
> https://www.redhat.com/archives/dm-devel/2017-November/msg00208.html
>
> Anyway, I'm just about to switch to Daddy Daycare mode (need to get my
> daughter up from her nap, feed her dinner, etc) so: I'll circle back to
> this tomorrow morning.
>
Thanks for the reminder - it's coming back to me now.
And looking again at the code, it doesn't seem so bad. I was worried
about reassigning ci.io->orig_bio after calling
__split_and_process_non_flush(), but the important thing is to set it
before the last dec_pending() call, and the code gets that right.
I think I've found the problem though:
/* done with normal IO or empty flush */
bio->bi_status = io_error;
When we have chained bios, there are multiple sources for error which
need to be merged into bi_status: all bios that were chained to this
one, and this bio itself.
__bio_chain_endio uses:
if (!parent->bi_status)
parent->bi_status = bio->bi_status;
so the new status won't over-ride the old status (unless there are
races....), but dm doesn't do that - I guess it didn't have to worry
about chains before.
So this:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d6de00f367ef..68136806d365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
- bio->bi_status = io_error;
+ if (io_error)
+ bio->bi_status = io_error;
bio_endio(bio);
}
}
should fix it. And __bio_chain_endio should probably be
diff --git a/block/bio.c b/block/bio.c
index e1708db48258..ad77140edc6f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
{
struct bio *parent = bio->bi_private;
- if (!parent->bi_status)
+ if (bio->bi_status)
parent->bi_status = bio->bi_status;
bio_put(bio);
return parent;
to remove the chance that a race will hide a real error.
Milan: could you please check that the patch fixes the dm-crypt bug?
I've confirmed that it fixes your test case.
When you confirm, I'll send a proper patch.
I guess I should audit all code that assigns to ->bi_status and make
sure nothing ever assigns 0 to it.
Thanks,
NeilBrown
Attachment:
signature.asc
Description: PGP signature