Re: [RFC PATCH] bio integrity: do not assume bio_integrity_pool exists if bioset exists

From: Mike Snitzer
Date: Tue Jul 07 2015 - 01:00:33 EST


On Wed, Jul 01 2015 at 12:57pm -0400,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

> bio_integrity_alloc() and bio_integrity_free() assume that if a bio was
> allocated from a bioset that that bioset also had its bio_integrity_pool
> allocated using bioset_integrity_create(). This is a very bad
> assumption given that bioset_create() and bioset_integrity_create() are
> completely disjoint. Not all callers of bioset_create() have been
> trained to also call bioset_integrity_create() -- and they may not care
> to be.
>
> Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload'
> rather than force all bioset consumers to (wastefully) preallocate a
> bio_integrity_pool that they very likely won't actually need (given the
> niche nature of the current block integrity support).
>
> Otherwise, a NULL pointer "Kernel BUG" with a trace like the following
> will be observed (as seen on s390x using zfcp storage) because dm-io
> doesn't use bioset_integrity_create() when creating its bioset:
>
> [ 791.643338] Call Trace:
> [ 791.643339] ([<00000003df98b848>] 0x3df98b848)
> [ 791.643341] [<00000000002c5de8>] bio_integrity_alloc+0x48/0xf8
> [ 791.643348] [<00000000002c6486>] bio_integrity_prep+0xae/0x2f0
> [ 791.643349] [<0000000000371e38>] blk_queue_bio+0x1c8/0x3d8
> [ 791.643355] [<000000000036f8d0>] generic_make_request+0xc0/0x100
> [ 791.643357] [<000000000036f9b2>] submit_bio+0xa2/0x198
> [ 791.643406] [<000003ff801f9774>] dispatch_io+0x15c/0x3b0 [dm_mod]
> [ 791.643419] [<000003ff801f9b3e>] dm_io+0x176/0x2f0 [dm_mod]
> [ 791.643423] [<000003ff8074b28a>] do_reads+0x13a/0x1a8 [dm_mirror]
> [ 791.643425] [<000003ff8074b43a>] do_mirror+0x142/0x298 [dm_mirror]
> [ 791.643428] [<0000000000154fca>] process_one_work+0x18a/0x3f8
> [ 791.643432] [<000000000015598a>] worker_thread+0x132/0x3b0
> [ 791.643435] [<000000000015d49a>] kthread+0xd2/0xd8
> [ 791.643438] [<00000000005bc0ca>] kernel_thread_starter+0x6/0xc
> [ 791.643446] [<00000000005bc0c4>] kernel_thread_starter+0x0/0xc
>
> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> block/bio-integrity.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> NOTE: this serves as a more generic fix in the block layer rather than
> the dm-io specific fix which isn't ideal (due to potential for memory
> waste), see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=17dbe96d4f8a6f87004e6cfb5944872dfe2edb9f
>
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 0436c21..719b715 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -51,7 +51,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
> unsigned long idx = BIO_POOL_NONE;
> unsigned inline_vecs;
>
> - if (!bs) {
> + if (!bs || !bs->bio_integrity_pool) {
> bip = kmalloc(sizeof(struct bio_integrity_payload) +
> sizeof(struct bio_vec) * nr_vecs, gfp_mask);
> inline_vecs = nr_vecs;
> @@ -104,7 +104,7 @@ void bio_integrity_free(struct bio *bio)
> kfree(page_address(bip->bip_vec->bv_page) +
> bip->bip_vec->bv_offset);
>
> - if (bs) {
> + if (bs && bs->bio_integrity_pool) {
> if (bip->bip_slab != BIO_POOL_NONE)
> bvec_free(bs->bvec_integrity_pool, bip->bip_vec,
> bip->bip_slab);
> --
> 2.3.2 (Apple Git-55)
>
>

Both the above block patch and the referenced dm-io patch fix the
following issue. I really prefer the block fix over the dm-io one. As
such I'd like to see it go upstream during 4.2-rc.

Jens, what do you think?

Here is an updated NULL pointer trace using 4.2-rc1 (which is easily hit
using the attached reproducer script):

[ 239.425111] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 239.426010] IP: [<ffffffff811b2bb0>] mempool_alloc+0x60/0x180
[ 239.426010] PGD 0
[ 239.426010] Oops: 0000 [#1] SMP
[ 239.426010] Modules linked in: dm_mirror dm_region_hash dm_log scsi_debug sg nfsv3 nfs fscache crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper dm_mod cryptd pcspkr serio_raw virtio_balloon 8139too i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ext4 mbcache jbd2 ata_generic sd_mod pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_scsi ttm virtio_blk drm ata_piix libata virtio_pci virtio_ring 8139cp virtio mii i2c_core floppy
[ 239.426010] CPU: 2 PID: 134 Comm: kworker/2:2 Not tainted 4.2.0-rc1+ #59
[ 239.426010] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 239.426010] Workqueue: kcopyd do_work [dm_mod]
[ 239.426010] task: ffff88011a701bc0 ti: ffff880036a10000 task.ti: ffff880036a10000
[ 239.426010] RIP: 0010:[<ffffffff811b2bb0>] [<ffffffff811b2bb0>] mempool_alloc+0x60/0x180
[ 239.426010] RSP: 0018:ffff880036a13878 EFLAGS: 00010206
[ 239.426010] RAX: ffff880036a13888 RBX: ffff88011a701bc0 RCX: 0000000000000000
[ 239.426010] RDX: 0000000000000086 RSI: ffffffff81a88940 RDI: 0000000000000246
[ 239.426010] RBP: ffff880036a138e8 R08: 000000000001a9b0 R09: 0000000000000080
[ 239.426010] R10: ffff88011b001500 R11: 0000000000000000 R12: 0000000000000060
[ 239.426010] R13: ffff880036a138a0 R14: 0000000000011200 R15: 0000000000000000
[ 239.426010] FS: 0000000000000000(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
[ 239.426010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 239.426010] CR2: 0000000000000048 CR3: 00000000d8797000 CR4: 00000000000407e0
[ 239.426010] Stack:
[ 239.426010] 0000000000000001 0001121000000010 ffffffff81a88940 ffff88011b001500
[ 239.426010] ffff880036a138c8 ffffffff810e01cd 00000000000004f2 00000000627d6696
[ 239.426010] 0000000000000246 0000000000000400 ffff8800da8cfd00 0000000000000010
[ 239.426010] Call Trace:
[ 239.426010] [<ffffffff810e01cd>] ? __lock_is_held+0x4d/0x70
[ 239.426010] [<ffffffff8136b00f>] bio_integrity_alloc+0x4f/0x1d0
[ 239.426010] [<ffffffff8136b803>] bio_integrity_prep+0xc3/0x220
[ 239.426010] [<ffffffff8134c56f>] blk_sq_make_request+0x10f/0x570
[ 239.426010] [<ffffffff810e01cd>] ? __lock_is_held+0x4d/0x70
[ 239.426010] [<ffffffff8133b236>] generic_make_request+0xd6/0x110
[ 239.426010] [<ffffffff8133b2e7>] submit_bio+0x77/0x150
[ 239.426010] [<ffffffff81335a3f>] ? bio_alloc_bioset+0x1df/0x2e0
[ 239.426010] [<ffffffffa036cd48>] dispatch_io+0x1a8/0x3a0 [dm_mod]
[ 239.426010] [<ffffffffa036c840>] ? dm_copy_name_and_uuid+0xc0/0xc0 [dm_mod]
[ 239.426010] [<ffffffffa036c870>] ? list_get_page+0x30/0x30 [dm_mod]
[ 239.426010] [<ffffffffa036d7eb>] ? run_io_job+0x9b/0x190 [dm_mod]
[ 239.426010] [<ffffffffa036d4c0>] ? dm_kcopyd_do_callback+0x50/0x50 [dm_mod]
[ 239.426010] [<ffffffffa036d183>] dm_io+0x103/0x210 [dm_mod]
[ 239.426010] [<ffffffffa036c840>] ? dm_copy_name_and_uuid+0xc0/0xc0 [dm_mod]
[ 239.426010] [<ffffffffa036c870>] ? list_get_page+0x30/0x30 [dm_mod]
[ 239.426010] [<ffffffffa036d834>] run_io_job+0xe4/0x190 [dm_mod]
[ 239.426010] [<ffffffffa036d4c0>] ? dm_kcopyd_do_callback+0x50/0x50 [dm_mod]
[ 239.426010] [<ffffffffa036da7a>] process_jobs+0x5a/0x100 [dm_mod]
[ 239.426010] [<ffffffffa036d750>] ? segment_complete+0x170/0x170 [dm_mod]
[ 239.426010] [<ffffffffa036db91>] do_work+0x71/0xa0 [dm_mod]
[ 239.426010] [<ffffffff810a79d7>] process_one_work+0x1e7/0x7e0
[ 239.426010] [<ffffffff810a794a>] ? process_one_work+0x15a/0x7e0
[ 239.426010] [<ffffffff810a80e4>] worker_thread+0x114/0x460
[ 239.426010] [<ffffffff810a7fd0>] ? process_one_work+0x7e0/0x7e0
[ 239.426010] [<ffffffff810aebd7>] kthread+0x107/0x120
[ 239.426010] [<ffffffff817065f0>] ? _raw_spin_unlock_irq+0x30/0x50
[ 239.426010] [<ffffffff810aead0>] ? kthread_create_on_node+0x240/0x240
[ 239.426010] [<ffffffff8170731f>] ret_from_fork+0x3f/0x70
[ 239.426010] [<ffffffff810aead0>] ? kthread_create_on_node+0x240/0x240
[ 239.426010] Code: d8 83 e3 af 4d 8d 67 60 0d 00 12 01 00 41 89 de 89 45 9c 48 8d 45 a0 41 81 ce 00 12 01 00 65 48 8b 1c 25 00 ba 00 00 4c 8d 68 18 <49> 8b 77 48 44 89 f7 41 ff 57 50 48 85 c0 74 45 48 8b 4d c8 65
[ 239.426010] RIP [<ffffffff811b2bb0>] mempool_alloc+0x60/0x180
[ 239.426010] RSP <ffff880036a13878>
[ 239.426010] CR2: 0000000000000048
[ 239.426010] ---[ end trace a4a18906f4031b09 ]---

Attachment: bip_bioset_NULL_ptr.sh
Description: Bourne shell script